Polypeptide nucleic sequences exported from mycobacteria, vectors comprising same and uses for diagnosing and preventing tuberculosis

ABSTRACT

Purified polynucleotides and polypeptides, and cells of  M. smegmatis, M. bovis, M. bovis  BCG, or  M. africanum  are provided.

The subject of the invention is novel recombinant screening, cloningand/or expression vectors which replicate in mycobacteria. Its subjectis also a set of sequences encoding exported polypeptides which aredetected by fusions with alkaline phosphatase and whose expression isregulated (induced or repressed) or constitutive during the ingestion ofmycobacteria by macrophages. The invention also relates to apolypeptide, called DP428, of about 12 kD which corresponds to anexported protein found in mycobacteria belonging to the Mycobacteriumtuberculosis complex. The invention also relates to a polynucleotidecomprising a sequence encoding this polypeptide. It also relates to theuse of the polypeptide or of fragments thereof and of thepolynucleotides encoding the latter (or alternatively thepolynucleotides complementary to the latter) for the production of meansfor detecting in vitro or in vivo the presence of a mycobacteriumbelonging to the Mycobacterium tuberculosis complex in a biologicalsample or for the detection of reactions of the host infected with thesebacterial species. The invention finally relates to the use of thepolypeptide or of fragments thereof as well as of the polynucleotidesencoding the latter as means intended for the preparation of animmunogenic composition which is capable of inducing an immune responsedirected against the mycobacteria belonging to the Mycobacteriumtuberculosis complex, or of a vaccine composition for the preventionand/or treatment of infections caused by mycobacteria belonging to saidcomplex, in particular tuberculosis.

The aim of the present invention is also to use these sequences(polypeptide and polynucleotide sequences) as target for the search fornovel inhibitors of the growth and multiplication of mycobacteria and oftheir maintenance in the host, it being possible for these inhibitors toserve as antibiotics.

The genus Mycobacterium, which comprises at least 56 different species,includes major human pathogens such as M. leprae and M. tuberculosis,the agents responsible for leprosy and tuberculosis, which remainserious public health problems worldwide.

Tuberculosis continues to be a public health problem in the world. Atpresent, this disease is the cause of 2 to 3 million deaths in the worldand about 8 million new cases are observed each year (Bouvet, 1994). Indeveloped countries, M. tuberculosis is the most common cause ofmycobacteria infections. In France, about 10,000 new cases appear peryear and, among the notifiable diseases, it is tuberculosis whichcomprises the highest number of cases. Vaccination with BCG (BacilleCalmette-Guérin), an avirulent strain which is derived from M. bovis andwhich is widely used as a vaccine against tuberculosis, is far frombeing effective in all populations. This efficacy varies from about 80%in western countries such as England, to 0% in India (results of thelast vaccination trial in Chingleput., published in 1972 in Indian J.Med. Res.). Furthermore, the appearance of M. tuberculosis strains whichare resistant to antituberculars and the increased risk inimmunosuppressed patients, patients suffering from AIDS, of developingtuberculosis, make the development of rapid, specific and reliablemethods for the diagnosis of tuberculosis and the development of novelvaccines necessary. For example, an epidemiological study carried out inFlorida, and of which the results were published in 1993 in AIDStherapies, showed that 10% of the AIDS patients are affected bytuberculosis at the time of the AIDS diagnosis or 18 months before it.In these patients, tuberculosis appears in 60% of cases in a form whichis disseminated and therefore nondetectable by conventional diagnosticcriteria such as pulmonary radiography or the analysis of sputum.

Currently, a certainty on the diagnosis provided by the detection ofbacilli which can be cultured in a sample obtained from a patient isobtained in only less than half of the tuberculosis cases, even in thecase of pulmonary tuberculosis. The diagnosis of tuberculosis and of theother related mycobacteria is therefore difficult to carry out forvarious reasons: mycobacteria are often present in a small quantity,their generation time is very long (24 h for M. tuberculosis) and theyare difficult to culture (Bates et al., 1986).

Other techniques can be used in clinical medicine to identify amycobacterial infection:

a) The direct identification of microorganisms under a microscope; thistechnique is rapid, but does not allow the identification of themycobacterial species observed and lacks sensitivity (Bates, 1979).

Cultures, when they are positive, have a specificity approaching 100%and allow the identification of the mycobacterial species isolated;however, as specified above, the growth of mycobacteria in vitro is long(can only be carried out in 3 to 6 weeks of repeated cultures (Bates,1979; Bates et al., 1986)) and expensive.

b) Serological techniques are found to be useful under certainconditions, but their use is sometimes limited by their low sensitivityand/or specificity (Daniel et al., 1987).

c) The presence of mycobacteria in a biological sample can also bedetermined by molecular hybridization with DNA or RNA usingoligonucleotide probes which are specific for the sequences tested for(Kiehn et al., 1987; Roberts et al., 1987; Drake et al., 1987). Severalstudies have shown the advantage of this technique for the diagnosis ofmycobacterial infections. The probes used consist of DNA, ribosomal RNAor DNA fragments from mycobacteria which are obtained from gene banks.The principle of these techniques is based on the polymorphism of thenucleotide sequences of the fragments used or on the polymorphism of theadjacent regions. In all cases, they require the use of cultures and arenot directly applicable to biological samples.

The low quantity of mycobacteria present in a biological sample andconsequently the low quantity of target DNA to be detected in thissample can require the use of a specific amplification in vitro of thetarget DNA before its detection with the aid of the nucleotide probe andusing in vitro amplification techniques such as PCR (polymerase chainreaction). The specific amplification of the DNA by the PCR techniquecan constitute the first stage of a method for detecting the presence ofa mycobacterial DNA in a biological sample, the actual detection of theamplified DNA being carried out in a second stage with the aid of anoligonucleotide probe capable of specifically hybridizing with theamplified DNA.

A test for the detection of mycobacteria belonging to the Mycobacteriumtuberculosis complex, by sandwich hybridization (test using a captureprobe and a detection probe) was described by Chevrier et al. in 1993.The Mycobacterium tuberculosis complex is a group of mycobacteria whichcomprises M. bovis-BCG, M. bovis, M. tuberculosis, M. africanum and M.microti.

A method for the detection of low quantities of mycobacteria, belongingto the tuberculosis complex, by gene amplification and directhybridization on biological samples has been developed. Said method usesthe insertion sequence IS6110 (European Patent EP 0,490,951 B1). Thierryet al. described in 1990 a sequence which is specific to theMycobacterium tuberculosis complex and which is called IS6110. Someauthors have proposed specifically amplifying the DNA obtained fromMycobacterium using nucleic primers in an amplification method, such asthe polymerase chain reaction (PCR). Patel et al. described in 1990 theuse of several nucleic primers chosen from a sequence known as a probein the identification of M. tuberculosis. However, the length of thefragments obtained using these primers was different from the expectedtheoretical length and several fragments of variable size were obtained.Furthermore, the authors observed the absence of hybridization of theamplified products with the plasmid which served to determine theprimers. These results indicate that these primers might not beappropriate in the detection of the presence of M. tuberculosis in abiological sample and confirm the critical nature of the choice of theprimers. The same year, J. L. Guesdon and D. Thierry described a methodfor the detection of M. tuberculosis, having a high sensitivity, byamplification of an M. tuberculosis DNA fragment located within theIS6110 sequence (European Patent EP 461,045) with the aid of primersgenerating amplified DNA fragments of constant length, even when thechoice of the primers led to the amplification of long fragments (of theorder of 1000 to 1500 bases) where the risk of interruption of thepolymerization is high because of the effects of the secondary structureof the sequence. Other primers specific for the IS6110 sequence aredescribed in European Patent No. EP-0,490,951.

The inventors have shown (unpublished results) that some clinicalisolates of Mycobacterium tuberculosis lacked the insertion sequenceIS6110 and could therefore not be detected with the aid ofoligonucleotides specific for this sequence which could thus lead tofalse-negative diagnostic results. These results confirm a similarobservation made by Yuen et al. in 1993. The impossibility of detectingthese pathogenic strains which are potentially present in a biologicalsample collected from a patient is thus likely to lead to diagnosticdifficulties or even to diagnostic errors. The availability of severalsequences specific for the tubercule bacillus, within which primersappropriate for amplification will be chosen, is important. The DP428sequence described here may be used.

M. bovis and M. tuberculosis, the causative agents of tuberculosis, arefacultative intracellular bacteria.

These agents have developed mechanisms to ensure their survival andtheir replication inside macrophage, one of the cell types which issupposed to eradicate invasion by microorganisms. These agents arecapable of modulating the normal development of their phagosome and ofpreventing them from becoming differentiated into an acidic compartmentrich in hydrolase (Clemens, 1979; Clemens et al., 1996;Sturgill-Koszycki et al., 1994 and Xu et al., 1994). However, thismodulation is only possible if the bacterium is alive inside thephagosome, suggesting that compounds which are actively synthesizedand/or secreted inside the cell are part of this mechanism. Exportedproteins are probably involved in this mechanism. Despite major healthproblems linked to these pathogenic organisms, little is known on theirexported and/or secreted proteins. SDS-PAGE analyses of M. tuberculosisculture filtrate show at least 30 secreted proteins (Altschul et al.,1990; Nagal et al., 1991 and Young et al., 1992). Some of them have beencharacterized, their genes cloned and sequenced (Borremans et al., 1989;Wiker et al., 1992 and Yamaguchi et al., 1989). Others, although beingimmunodominant antigens of major importance for inducing a protectiveimmunity (Anderson et al., 1991 and Orme et al., 1993), have not beencompletely identified. In addition, it is probable that many exportedproteins remain attached to the cell membrane and are consequently notpresent in the culture supernatants. It has been shown that the proteinslocated at the outer surface of various pathogenic bacteria, such as the103 kDa invasin of Yersina Pseudotuberculosis (Isberg et al., 1987) orthe 80 kDa internalin of Listeria monocytogenes (Gaillard et al., 1991and Dramsi et al., 1997) play an important role in the interactions withthe host cells and, consequently, in the pathogenicity as well as in theinduction of protective responses. Thus, a protein which is bound to themembrane would be important for the M. tuberculosis infection as well asfor the induction of a protective response against this infection. Theseproteins could certainly be of interest for the preparation of vaccines.

Recently, the adaptation, to mycobacteria, of a genetic methodology forthe identification and the phenotypic selection of export proteins hasbeen described (Lim et al., 1995). This method uses E. coli periplasmicalkaline phosphatase (PhoA). A plasmid vector was constructed whichallows the fusion of genes between a truncated PhoA gene and genesencoding exported proteins (Manoil et al., 1990).

Using this method, it has been possible to identify an M. tuberculosisgene (erp (Berthet et al., 1995)) exhibiting homologies with a 28 kDaexported protein of M. leprae, which is a frequent target of humoralresponses of the lepromatous form of leprosy. A protein having aminoacid motifs which are characteristic of plant desaturase (des) has alsobeen characterized by the technique of fusion with PhoA.

However, this genetic method for identifying exported proteins does notmake it possible to easily evaluate the intracellular expression of thecorresponding genes. Such an evaluation is of crucial importance bothfor selecting good candidate vaccines and for understanding theinteractions between bacteria and their host cells. The induction of theexpression of virulence factor through pathogenic target cell contacthas been described. It is the case, for example, for the Yersiniapseudotuberculosis Yops virulence factors (Petersson et al., 1996).Shigella, upon contact with the target cells, releases the Ipa proteinsinto the culture medium, and Salmonella synthesizes novel surfacestructures.

Taking into account the preceding text, a great need currently existsfor developing novel vaccines against pathogenic microbacteria as wellas novel specific, reliable and rapid diagnostic tests. Thesedevelopments require the designing of even more efficient specific toolswhich make it possible, on the one hand, to isolate or to obtainsequences of novel specific, in particular immunogenic, polypeptides,and, on the other hand, to better understand the mechanism of theinteractions between bacteria and their host cells such as in particularthe induction of the expression of virulence factor. This is preciselythe object of the present invention.

The inventors have defined and produced, for this purpose, novel vectorsallowing the screening, cloning and/or expression of mycobacterial DNAsequences so as to identify, among these sequences, nucleic acidsencoding proteins of interest, preferably exported proteins, which maybe located on the bacterial membrane, and/or secreted proteins, and toidentify among these sequences those which are induced or repressedduring infection (intracellular growth).

DESCRIPTION

The present invention describes the use of the reporter gene phoA inmycobacteria. It makes it possible to identify systems for expressionand export in a mycobacterial context. Many genes are only expressed insuch a context, which shows the advantage of the present invention.During the cloning of DNA segments of strains of the M. tuberculosiscomplex fused with phoA into another mycobacterium such as M. smegmatis,the beginning of the gene, its regulatory regions and its regulator willbe cloned, which will make it possible to observe a regulation. If thisregulation is positive, the cloning of the regulator will constitute anadvantage for observing the expression and the export.

In the context of the invention, mycobacterium is understood to mean allthe mycobacteria belonging to the various species listed by Wayne L. G.and Kubica G. P. (1980). Family Mycobacteriaceae in Bergey's manual ofsystematic bacteriology, J. P. Butler Ed. (Baltimore USA: Williams andWilkins P. 1436-1457).

In some cases, the cloned genes are subjected in their original host toa negative regulation which makes the observation of the expression andof the export difficult in the original host. In this case, the cloningof the gene in the absence of its negative regulator, into a host notcontaining it, will constitute an advantage.

The invention also relates to novel mycobacterial polypeptides and tonovel mycobacterial polynucleotides which may have been isolated bymeans of the preceding vectors and which are capable of entering intothe preparation of compositions for the detection of a microbacterialinfection, or for the protection against an infection caused bymycobacteria or for the search for inhibitors as is described above forDP428.

The subject of the invention is therefore a recombinant screening,cloning and/or expression vector, characterized in that it replicates inmycobacteria and in that it contains:

1) a replicon which is functional in mycobacteria;

2) a selectable marker;

3) a reporter cassette comprising:

-   -   a) a multiple cloning site (polylinker),    -   b) optionally a transcription terminator which is active in        mycobacteria, upstream of the polylinker,    -   c) a coding nucleotide sequence which is derived from a gene        encoding a protein expression, export and/or secretion marker,        said nucleotide sequence lacking its initiation codon and its        regulatory sequences, and    -   d) a coding nucleotide sequence derived from a gene encoding a        marker for the activity of promoters which are contained in the        same fragment, said nucleotide sequence lacking its initiation        codon. Optionally, the recombinant vector also contains a        replicon which is functional in E. coli.

Preferably, the export and/or secretion marker is placed in the sameorientation as the promoter activity marker.

Preferably, the recombinant screening vector according to the inventioncomprises, in addition, a transcription terminator placed downstream ofthe promoter activity marker, which is likely to allow the production ofshort transcripts which are found to be more stable and whichconsequently allow a higher level of expression of the products oftranslation.

The export and/or secretion marker is a nucleotide sequence whoseexpression, followed by export and/or secretion, depends on theregulatory elements which control its expression.

“Sequences or elements for regulating the expression of the productionof polypeptides and its location” is understood to mean atranscriptional promoter sequence, a sequence comprising theribosome-binding site (RBS), the sequences responsible for export and/orsecretion such as the sequence termed signal sequence.

A first advantageous export and/or expression marker is a codingsequence derived from the phoA gene. Where appropriate, it is truncatedsuch that the alkaline phosphatase activity is nevertheless capable ofbeing restored when the truncated coding sequence is placed under thecontrol of a promoter and of appropriate regulatory elements.

Other exposure, export and/or secretion markers may be used. There maybe mentioned, by way of examples, a sequence of the gene for β-agarase,for the nuclease of a staphylococcus or for a β-lactamase.

Among the advantageous markers for the activity of promoters which arecontained in the same fragment, a coding sequence derived from thefirefly luciferase luc gene, provided with its initiation codon, ispreferred.

Other markers for the activity of promoters which are contained in thesame fragment may be used. There may be mentioned, by way of examples, asequence of the gene for GFP (Green Fluorescent Protein).

The transcription terminator should be functional in mycobacteria. Anadvantageous terminator is, in this regard, the T4 coliphage terminator(tT4). Other appropriate terminators for carrying out the invention maybe isolated using the technique presented in the examples, for exampleby means of an “omega” cassette (Prentki et al., 1984).

A vector which is particularly preferred for carrying out the inventionis a plasmid chosen from the following plasmids which have beendeposited at the CNCM (Collection Nationale de Cultures deMicroorganismes, rue de Docteur Roux, 75724 Paris cedex 15, France):

a) pJVEDa which was deposited at the CNCM under the No. I-1797, on Dec.12, 1996,

b) pJVEDb which was deposited at the CNCM under the No. I-1906, on 25Jul. 1997,

c) pJVEDc which was deposited at the CNCM under the No. I-1799, on Dec.12, 1996.

For the selection or the identification of mycobacterial nucleic acidsequences encoding polypeptides which are capable of being incorporatedinto immunogenic or antigenic compositions for the detection of aninfection, or which are capable of inducing or repressing amycobacterial virulence factor, the vector of the invention willcomprise, at one of the multiple cloning sites of the polylinker, anucleotide sequence of a mycobacterium in which the detection is carriedout of the presence of sequences corresponding to exported and/orsecreted polypeptides which may be induced or repressed during theinfection, or alternatively expressed or produced constitutively, theirassociated promoter and/or regulatory sequences which are capable ofallowing or promoting the export and/or the secretion of saidpolypeptides of interest, or all or part of the genes of interestencoding said polypeptides.

Preferably, this sequence is obtained by physical fragmentation or byenzymatic digestion of the genomic DNA or of the DNA which iscomplementary to an RNA of a mycobacterium, preferably M. tuberculosisor chosen from M. africanum, M. bovis, M. avium or M. leprae.

The vectors of the invention may indeed also be used to determine thepresence of sequences of interest, preferably corresponding to exportedand/or secreted proteins, and/or capable of being induced or repressedor produced constitutively during the infection, in particular duringphagocytosis by the macrophages, and, according to what was previouslydisclosed, in mycobacteria such as M. africanum, M. bovis, M. avium orM. leprae whose DNA or cDNA will have been treated by physicalfragmentation or with defined enzymes.

According to a first embodiment of the invention, the enzymatic digesionof the genomic DNA or of the complementary DNA is carried out using M.tuberculosis.

Preferably, this DNA is digested with an enzyme such as Sau3A, BcII orBglII.

Other digestive enzymes such as ScaI, ApaI, SacI or KpnI oralternatively nucleases or polymerases can naturally be used as long asthey allow the production of fragments whose ends can be inserted intoone of the cloning sites of the polylinker of the vector of theinvention.

Where appropriate, the digestions with various enzymes will be carriedout simultaneously.

Recombinant vectors which are preferred for carrying out the inventionare chosen from the following recombinant vectors which have beendeposited at the CNCM:

a) p6D7 which was deposited on 28 Jan. 1997 at the CNCM under the No.I-1814,

b) p5A3 which was deposited on 28 Jan. 1997 at the CNCM under the No.I-1815,

c) p5F6 which was deposited on 28 Jan. 1997 at the CNCM under the No.I-1816,

d) p2A29 which was deposited on 28 Jan. 1997 at the CNCM under the No.I-1817,

e) pDP428 which was deposited on 28 Jan. 1997 at the CNCM under the No.I-1818,

f) p5B5 which was deposited on 28 Jan. 1997 at the CNCM under the No.I-1819,

g) p1C7 which was deposited on 28 Jan. 1997 at the CNCM under the No.I-1820,

h) p2D7 which was deposited on 28 Jan. 1997 at the CNCM under the No.I-1821,

i) p1B7 which was deposited on 31 Jan. 1997 at the CNCM under the No.I-1843,

j) pJVED/M. tuberculosis which was deposited on 25 Jul. 1997 at the CNCMunder the No. I-1907,

k) pM1C25 which was deposited on 4 Aug. 1998 at the CNCM under the No.I-2062.

Among those which are most preferred, the recombinant vector pDP428which was deposited on 28 Jan. 1997 at the CNCM under the No. I-1818,and the vector pM1C25 which was deposited on 4 Aug. 1998 at the CNCMunder the No. 1-2062 are preferred.

The subject of the invention is also a method of screening nucleotidesequences derived from mycobacteria in order to determine the presenceof sequences corresponding to exported and/or secreted polypeptideswhich may be induced or repressed during the infection, their associatedpromoter and/or regulatory sequences which are capable in particular ofallowing or promoting the export and/or secretion of said polypeptidesof interest, or all or part of genes of interest encoding saidpolypeptides, characterized in that it uses a recombinant vectoraccording to the invention.

The invention also relates to a method of screening, according to theinvention, characterized in that it comprises the following steps:

a) physical fragmentation of the mycobacterial DNA sequences or theirdigestion with at least one defined enzyme and recovery of the fragmentsobtained;

b) insertion of the fragments obtained in step a) into a cloning site,which is compatible, where appropriate, with the enzyme of step a), ofthe polylinker of a vector according to the invention;

c) if necessary, amplification of said fragments contained in thevector, for example by replication of the latter after insertion of thevector thus modified into a defined cell, preferably E. coli;

d) transformation of the host cells with the vector amplified in stepc), or in the absence of amplification, with the vector of step b);

e) culture of the transformed host cells in a medium allowing thedetection of the export and/or secretion marker, and/or of the promoteractivity marker which is contained in the vector;

f) detection of the host cells which are positive (positive colonies)for the expression of the export and/or secretion marker, and/or of thepromoter activity marker;

g) isolation of the DNA from the positive colonies and insertion of thisDNA into a cell which is identical to that in step c);

h) selection of the inserts contained in the vector, allowing theproduction of clones which are positive for the export and/or secretionmarker, and/or for the promoter activity marker;

i) isolation and characterization of the mycobacterial DNA fragmentscontained in these inserts.

In one of the preferred embodiments of the screening method according tothe invention, the host cells, detected in step f), which are positivefor the export and/or secretion marker are, optionally in a secondstage, tested for the capacity of the selected nucleotide insert tostimulate the expression of the promoter activity marker when said hostcells are phagocytosed by macrophage-type cells.

More specifically, the stimulation of the expression of the promoteractivity marker in host cells placed in axenic culture (host cells alonein culture) is compared with the stimulation of the expression of thepromoter activity marker in host cells cultured in the presence ofmacrophages and which are thus phagocytosed by the latter.

The selection of host cells which are positive for the promoter activitymarker can be carried out immediately after step e) of the method ofscreening described above, or alternatively after any one of steps f),g), h) or i), that is to say once the host cells have been positivelyselected for the export and/or selection marker.

The use of this method allows the construction of DNA librariescomprising sequences corresponding to polypeptides which are capable ofbeing exported and/or secreted, and/or which are capable of beinginduced or repressed during the infection when they are produced insiderecombinant mycobacteria. Step i) of the method may comprise a step forsequencing the inserts selected.

Preferably, in the method according to the invention, the vector used ischosen from the plasmids pJVEDa (CNCM, No. I-1797), pJVEDb (CNCM, No.I-1906), pJVEDc (CNCM, No. I-1799) or pJVED/M. tuberculosis (CNCM, No.I-1907), and the digestion of the mycobacterial DNA sequences is carriedout by means of the enzyme Sau3A.

According to a preferred embodiment of the invention, the method ofscreening is characterized in that the mycobacterial sequences arederived from a pathogenic mycobacterium, for example from M.tuberculosis, M. bovis, M. avium, M. africanum or M. leprae.

The invention also comprises a library of genomic DNA or of cDNA whichis complementary to mycobacterial mRNA, characterized in that it isobtained by a method comprising steps a) and b) or a), b) and c) of thepreceding method according to the invention, preferably a library ofgenomic DNA or of cDNA which is complementary to mRNA of pathogenicmycobacteria, preferably of mycobacteria belonging to the Mycobacteriumtuberculosis complex group, preferably of Mycobacterium tuberculosis.

In the present invention, “nucleic sequences” or “amino acid sequences”are understood to designate SEQ ID No. X to SEQ ID No. Y, where X and Ymay independently represent a number or an alphanumeric character,respectively the set of nucleic sequences or the set of amino acidsequences represented by figures X to Y, ends included.

For example, the nucleic sequences or the amino acid sequences SEQ IDNOS: 1-87 are respectively the nucleic sequences or the amino acidsequences represented by FIGS. 1 to 4N.

The subject of the invention is also the nucleotide sequences ofmycobacteria or comprising nucleotide sequences of mycobacteria selectedafter carrying out the method according to the invention which isdescribed above.

Preferably, said mycobacterium is chosen from M. tuberculosis, M. bovis,M. africanum, M. avium, M. leprae, M. paratuberculosis, M. kansassi orM. xenopi.

The nucleotide sequences of mycobacteria or comprising a mycobacterialnucleotide sequence are preferred, said mycobacterial nucleotidesequence being chosen from the sequences of mycobacterial DNA fragmentshaving the nucleic sequences SEQ ID NOS: 1, 8, 14, 25, 31, 33, 35, 41,46, 52, 56, 62, 64, 67, 69, 72, 74, 76, 78, 81, 84, 86, 88, 90, 92, 96,98, 100, 104, 106, 108, 110, 113, 119, 122, 128, 133, 137, 139, 141,143, 145, 148, 150, 152, 154, 156, 158, 160, 162, 165, 169, 177, 184,189, 195, 200, 202, 206, 209, 211, 213, 217, 220, 225, 228, 238, 246,250, 255, 258, 260, 262, 268, 274, 278, 280, 282, 284, 286, 288, 290,297, 310, 317, 321, 323, 325, 327, 331, 333, 335, 337, 339, 346, 347,353, 357, 359, 361, 364, 368, 371, 374, 380, 383, 385, 387, 389, 393,395, 397, 399, 403, 405, 407, 410, 412, 419, 421, 426, 429, 431, 433,437, 441, 447, 452, 456, 459, 461, 463, 469, 472, 474, 476, 482, 485,487, 489, 495, 497, 501, 505, 510, 516, 519, 522, 530, 534, 537, 544,546, 550, 552, 554, 556, 558, 564, 569, 571, 573, 576, 580, 584, 586,588, 590, 594, 596, 598, 600, 604, 608, 610, 612, 614, 616, 618, 620,622, 624, 626, 629, 631, 633, 635, 640, 647, 649, 651, 653, 657, 660,662, 664, 666, 669, 674, 676, 678, 683, 686, 691, 693, 695, 697, 702,717, 728, 733, 736, 739, 741, 743, 746, 752, 755, 757, 759, 761, 764,767, 769, 771, 784, 794, 805, 807, 809, 811, 813, 817, 821, 823, 825,827, 831, 833, 835, 837, 839, 842, 844, 846, 848, 864, 878, 883, 885,887, 895, 901, 907, and 909, which are represented respectively by FIGS.1 to 24C (plates 1 to 150), by FIGS. 27A to 27C (plates 152 to 154), byFIG. 29 (plate 156) and by FIGS. 31A to 50F (plates 158 to 275).

According to a specific embodiment of the invention, preferred sequencesare, for example, the mycobacterial DNA fragments having the sequenceSEQ ID NO: 1, which is contained in the vector pDP428 (CNCM, No.I-1818), SEQ ID NO: 41, which is contained in the vector p6D7 (CNCM, No.I-1814), SEQ ID NOS: 88 and 96, which are contained in the vector p5F6(CNCM, No. I-1816), SEQ ID NO: 110, which is contained in the vectorp2A29 (CNCM, No. I-1817), SEQ ID NO: 122, which is contained in thevector p5B5 (CNCM, No. I-1819), SEQ ID NOS: 137 and 143, which arecontained in the vector p1C7 (CNCM, No. I-1820), SEQ ID NO: 158, whichis contained in the vector p2D7 (CNCM, No. I-1821), SEQ ID NO: 165,which is contained in the vector p1B7 (CNCM, No. I-1843), SEQ ID NO:530, which is contained in the vector p5A3 (CNCM, No. I-1815), or SEQ IDNO: 544, which is contained in the vector pM1C25 (CNCM, No. I-2062.

The invention also relates to a nucleic acid comprising the entire openreading frame of one of the nucleotide sequences according to theinvention, in particular one of the sequences SEQ ID NOS: 1, 8, 14, 25,31, 33, 35, 41, 46, 52, 56, 62, 64, 67, 69, 72, 74, 76, 78, 81, 84, 86,88, 90, 92, 96, 98, 100, 104, 106, 108, 110, 113, 119, 122, 128, 133,137, 139, 141, 143, 145, 148, 150, 152, 154, 156, 158, 160, 162, 165,169, 177, 184, 189, 195, 200, 202, 206, 209, 211, 213, 217, 220, 225,228, 238, 246, 250, 255, 258, 260, 262, 268, 274, 278, 280, 282, 284,286, 288, 290, 297, 310, 317, 321, 323, 325, 327, 331, 333, 335, 337,339, 346, 347, 353, 357, 359, 361, 364, 368, 371, 374, 380, 383, 385,387, 389, 393, 395, 397, 399, 403, 405, 407, 410, 412, 419, 421, 426,429, 431, 433, 437, 441, 447, 452, 456, 459, 461, 463, 469, 472, 474,476, 482, 485, 487, 489, 495, 497, 501, 505, 510, 516, 519, 522, 530,534, 537, 544, 546, 550, 552, 554, 556, 558, 564, 569, 571, 573, 576,580, 584, 586, 588, 590, 594, 596, 598, 600, 604, 608, 610, 612, 614,616, 618, 620, 622, 624, 626, 629, 631, 633, 635, 640, 647, 649, 651,653, 657, 660, 662, 664, 666, 669, 674, 676, 678, 683, 686, 691, 693,695, 697, 702, 717, 728, 733, 736, 739, 741, 743, 746, 752, 755, 757,759, 761, 764, 767, 769, 771, 784, 794, 805, 807, 809, 811, 813, 817,821, 823, 825, 827, 831, 833, 835, 837, 839, 842, 844, 846, 848, 864,878, 883, 885, 887, 895, 901, 907, and 909 according to the invention.

Said nucleic acid may be isolated, for example, in the following manner:

a) preparation of a cosmid library from the M. tuberculosis DNA, forexample according to the technique described by Jacobs et al., 1991;

b) hybridization of all or part of a probe nucleic acid having thesequences chosen, for example, from SEQ ID NOS: 1, 8, 14, 25, 31, 33,35, 41, 46, 52, 56, 62, 64, 67, 69, 72, 74, 76, 78, 81, 84, 86, 88, 90,92, 96, 98, 100, 104, 106, 108, 110, 113, 119, 122, 128, 133, 137, 139,141, 143, 145, 148, 150, 152, 154, 156, 158, 160, 162, 165, 169, 177,184, 189, 195, 200, 202, 206, 209, 211, 213, 217, 220, 225, 228, 238,246, 250, 255, 258, 260, 262, 268, 274, 278, 280, 282, 284, 286, 288,290, 297, 310, 317, 321, 323, 325, 327, 331, 333, 335, 337, 339, 346,347, 353, 357, 359, 361, 364, 368, 371, 374, 380, 383, 385, 387, 389,393, 395, 397, 399, 403, 405, 407, 410, 412, 419, 421, 426, 429, 431,433, 437, 441, 447, 452, 456, 459, 461, 463, 469, 472, 474, 476, 482,485, 487, 489, 495, 497, 501, 505, 510, 516, 519, 522, 530, 534, 537,544, 546, 550, 552, 554, 556, 558, 564, 569, 571, 573, 576, 580, 584,586, 588, 590, 594, 596, 598, 600, 604, 608, 610, 612, 614, 616, 618,620, 622, 624, 626, 629, 631, 633, 635, 640, 647, 649, 651, 653, 657,660, 662, 664, 666, 669, 674, 676, 678, 683, 686, 691, 693, 695, 697,702, 717, 728, 733, 736, 739, 741, 743, 746, 752, 755, 757, 759, 761,764, 767, 769, 771, 784, 794, 805, 807, 809, 811, 813, 817, 821, 823,825, 827, 831, 833, 835, 837, 839, 842, 844, 846, 848, 864, 878, 883,885, 887, 895, 901, 907, 909, with the cosmids of the library previouslyprepared in step a);

c) selection of the cosmids hybridizing with the probe nucleic acid ofstep b);

d) sequencing of the DNA inserts of the clones selected in step c) andidentification of the complete open reading frame;

e) where appropriate, cloning of the inserts sequenced in step d) intoan appropriate expression and/or cloning vector.

The nucleic acids comprising the entire open reading frame of thesequences SEQ ID NOS: 1, 8, 14, 25, 31, 33, 35, 41, 46, 52, 56, 62, 64,67, 69, 72, 74, 76, 78, 81, 84, 86, 88, 90, 92, 96, 98, 100, 104, 106,108, 110, 113, 119, 122, 128, 133, 137, 139, 141, 143, 145, 148, 150,152, 154, 156, 158, 160, 162, 165, 169, 177, 184, 189, 195, 200, 202,206, 209, 211, 213, 217, 220, 225, 228, 238, 246, 250, 255, 258, 260,262, 268, 274, 278, 280, 282, 284, 286, 288, 290, 297, 310, 317, 321,323, 325, 327, 331, 333, 335, 337, 339, 346, 347, 353, 357, 359, 361,364, 368, 371, 374, 380, 383, 385, 387, 389, 393, 395, 397, 399, 403,405, 407, 410, 412, 419, 421, 426, 429, 431, 433, 437, 441, 447, 452,456, 459, 461, 463, 469, 472, 474, 476, 482, 485, 487, 489, 495, 497,501, 505, 510, 516, 519, 522, 530, 534, 537, 544, 546, 550, 552, 554,556, 558, 564, 569, 571, 573, 576, 580, 584, 586, 588, 590, 594, 596,598, 600, 604, 608, 610, 612, 614, 616, 618, 620, 622, 624, 626, 629,631, 633, 635, 640, 647, 649, 651, 653, 657, 660, 662, 664, 666, 669,674, 676, 678, 683, 686, 691, 693, 695, 697, 702, 717, 728, 733, 736,739, 741, 743, 746, 752, 755, 757, 759, 761, 764, 767, 769, 771, 784,794, 805, 807, 809, 811, 813, 817, 821, 823, 825, 827, 831, 833, 835,837, 839, 842, 844, 846, 848, 864, 878, 883, 885, 887, 895, 901, 907,909, are among the preferred nucleic acids.

The present invention makes it possible to determine a gene fragmentencoding an exported polypeptide. Comparison with the genome sequencepublished by Cole et al. (Cole et al., 1998, Nature, 393, 537-544) makesit possible to determine the whole gene carrying the identified sequenceaccording to the present invention.

Nucleotide sequence comprising the entire open reading frame of asequence according to the invention is understood to mean the nucleotidesequence (genomic, cDNA, semisynthetic or synthetic) comprising one ofthe sequences according to the invention and extending, on the one hand,in 5′ of these sequences up to the first codon for initiation oftranslation (ATG or GTG) or even up to the first stop codon, and, on theother hand, in 3′ of these sequences up to the next stop codon, thisbeing in any one of the three possible reading frames.

The nucleotide sequences which are complementary to the above sequencesaccording to the invention also form part of the invention.

Polynucleotide having a sequence which is complementary to a nucleotidesequence according to the invention is understood to mean any DNA or RNAsequence whose nucleotides are complementary to those of said sequenceaccording to the invention and whose orientation is reversed.

The nucleotide fragments of the above sequences according to theinvention, which are in particular useful as probes or primers, alsoform part of the invention.

The invention also relates to the polynucleotides, characterized in thatthey comprise a polynucleotide chosen from:

a) a polynucleotide whose sequence is complementary to the sequence of apolynucleotide according to the invention,

b) a polynucleotide whose sequence comprises at least 50% identity witha polynucleotide according to the invention,

c) a polynucleotide which hybridizes, under high stringency conditions,with a polynucleotide sequence according to the invention,

d) a fragment of at least 8 consecutive nucleotides of a polynucleotidedefined according to the invention.

The high stringency conditions as well as the percentage identity willbe defined below in the present description.

When the coding sequence derived from the export and/or secretion markergene is a sequence derived from the phoA gene, the export and/orsecretion of the product of the phoA gene, truncated where appropriate,is obtained only when this sequence is inserted in phase with thesequence or element for regulating the expression of the production ofpolynucleotides and its location placed upstream, which contains theelements controlling the expression, export and/or secretion which arederived from a mycobacterial sequence.

The recombinant vectors of the invention may of course comprise multiplecloning sites which are shifted by one or two nucleotides relative to avector according to the invention, thus making it possible to expressthe polypeptide corresponding to the mycobacterial DNA fragment which isinserted and which is capable of being translated according to one ofthe three possible reading frames.

For example, the preferred vectors pJVEDb and pJVEDc of the inventionare distinguishable from the preferred vector pJVEDa by a respectiveshift of one and two nucleotides at the level of the multiple cloningsite.

Thus, the vectors of the invention are capable of expressing each of thepolypeptides which are capable of being encoded by an insertedmycobacterial DNA fragment. Said polypeptides, characterized in thatthey are therefore capable of being exported and/or secreted, and/orinduced or repressed, or expressed constitutively during the infection,form part of the invention.

The polypeptides of the invention whose amino acid sequences are chosenfrom the amino acid sequences SEQ ID NOS 2-7, 9-13, 15-24, 26-30, 32,34, 36-40, 42-45, 47-51, 53-55, 57-61, 63, 65-66, 68, 70-71, 73, 75, 77,79-80, 82-83, 85, 87, 89, 91, 93-95, 97, 99, 101-103, 105, 107, 109,111-112, 114-118, 123-127, 129-132, 134-136, 138, 272-273, 140, 142,144, 146-147, 149, 151, 153, 155, 157, 159, 161, 163-164, 166-168,170-176, 178-183, 185-188, 190-194, 196-199, 201, 203-205, 207-208, 210,212, 214-216, 218-219, 221-224, 226-227, 923-925, 229-237, 239-245,247-249, 251-254, 256 257, 259, 261, 263-267, 269-271, 275-277, 279,281, 283, 285, 287, 289, 291-296, 298-309, 311-316, 318-320, 322, 324,326, 328-330, 332, 334, 336, 338, 340-345, 348-352, 354-356, 358, 360,926-930, 362-363, 365-367, 369-370, 372-373, 375-379, 381-382, 384, 386,388, 390-392, 394, 396, 398, 400-402, 404, 406, 408-409, 411, 413-418,420, 422-425, 427-428, 430, 432, 434-436, 438-440, 442-446, 448-451,453-455, 457-458, 460, 462, 464-468, 470-471, 473, 475, 477-481,483-484, 486, 488, 490-494, 496, 498-500, 502-504, 506-509, 511-515,517-518, 520-521, 523-527, 531-533, 535-536, 538-542, 543, 545, 547-549,551, 553, 555, 557, 559-563, 565-568, 570, 572, 574-575, 577-579,581-583, 585, 587, 589, 591-593, 595, 597, 599, 601-603, 605-607, 609,611, 613, 615, 617, 619, 621, 623, 625, 627-628, 630, 632, 634, 636-639,641-646, 648, 650, 652, 654-656, 658-659, 661, 663, 665, 931-933,667-668, 670-673, 675, 677, 679-682, 684-685, 687-690, 692, 694, 696,698-701, 703-716, 718-727, 729-732, 734-735, 737-738, 740, 742, 744-745,747-751, 753-754, 756, 758, 760, 762-763, 765-766, 768, 770, 772-783,785-793, 795-804, 806, 808, 810, 812, 814-816, 818-820, 822, 824, 826,828-830, 832, 834, 836, 838, 840-841, 843, 845, 847, 849-863, 865-877,879-882, 884, 886, 888-894, 896-900, 902-906, 908, 910, and representedrespectively by FIGS. 1 to 24C (plates 1 to 150), FIGS. 27A to 28(plates 152 to 155) and FIGS. 30 to 50F (plates 157 to 275) are inparticular preferred.

Also forming part of the invention are the fragments or biologicallyactive fragments as well as the polypeptides which are homologous tosaid polypeptides; fragment, biologically active fragment andpolypeptides which are homologous to a polypeptide being as definedbelow in the description.

The invention also relates to the polypeptides comprising a polypeptideor one of their fragments according to the invention.

The subject of the invention is also recombinant mycobacteria containinga recombinant vector according to the invention which is describedabove. A preferred mycobacterium is a mycobacterium of the M. smegmatistype.

M. smegmatis advantageously makes it possible to test the efficiency ofmycobacterial sequences for controlling the expression, export and/orsecretion, and/or promoter activity of a given sequence, for example ofa sequence encoding a marker such as alkaline phosphatase and/orluciferase.

Another preferred mycobacterium is a mycobacterium of the M. bovis type,for example the BCG strain which is currently used for vaccinationagainst tuberculosis.

Another preferred mycobacterium is a strain of M. tuberculosis, M. bovisor M. africanum potentially possessing all the appropriate regulatorysystems.

The inventors have thus characterized, in particular, a polynucleotideconsisting of a nucleotide sequence which is present in all the testedstrains of mycobacteria belonging to the Mycobacterium tuberculosiscomplex. This polynucleotide, called DP428, contains an open readingframe (ORF) encoding a polypeptide of about 12 kD. The open readingframe (ORF) encoding the polypeptide DP428 extends from the nucleotideat position nt 451 to the nucleotide at position nt 861 of the sequenceSEQ ID NO: 35, the polypeptide DP428 having the following amino acidsequences SEQ ID NOS: 39 & 543:MKTGTATTRRRLLAVLIALALPGAAVALLAEPSATGASDPCAASEVARTVGSVAKSMGDYLDSHPETNQVMTAVLQQQVGPGSVASLKAHFEANPKVASDLHALSQPLTDLSTRCSLPISGLQAIGLMQAVQGARR.

This molecular weight (MW) corresponds to the theoretical MW of themature protein obtained after cleavage of the signal sequence, the MW ofthe protein or polypeptide DP428 being about 10 kD after potentialanchorage to peptidoglycan and potential cleavage between S and G of theLPISG motif.

This polynucleotide includes, on the one hand, an open reading framecorresponding to a structural gene and, on the other hand, the signalsfor regulating the expression of the coding sequence upstream anddownstream of the latter. The polypeptide DP428 is composed of a signalpeptide, a hydrophilic central region and a hydrophobic C-terminalregion. The latter ends with two arginine residues (R), a retentionsignal, and is preceded by an LPISG motif which resembles the LPXTGmotif for anchorage to peptidoglycan (Schneewind et al., 1995).

Structural gene for the purposes of the present invention is understoodto mean a polynucleotide encoding a protein, a polypeptide oralternatively a fragment of the latter, said polynucleotide comprisingonly the sequence corresponding to the open reading frame (ORF), whichexcludes the sequences on the 5′ side of the open reading frame (ORF)which direct the initiation of transcription.

Thus, the invention relates in particular to a polynucleotide whosesequence is chosen from the nucleotide sequences SEQ ID NOS: 1, 8, 14,25, 31, 33, and 35.

More particularly, the invention relates to a polynucleotide,characterized in that it comprises a polynucleotide chosen from:

a) a polynucleotide whose sequence is chosen from the nucleotidesequences SEQ ID NOS: 1, 8, 14, 25, 31, 33, and 35, b) a polynucleotidewhose nucleic sequence is the sequence between the nucleotide atposition nt 964 and the nucleotide at position nt 1234, ends included,of the sequence SEQ ID NOS: 1, 8, 14, 25, 31, and 33,

c) a polynucleotide whose sequence is complementary to the sequence of apolynucleotide defined in a) or b), d) a polynucleotide whose sequenceexhibits at least 50% identity with a polynucleotide defined in a), b)or c), e) a polynucleotide which hybridizes, under high stringencyconditions, with a sequence of a polynucleotide defined in a), b), c) ord), f) a fragment of at least 8 consecutive nucleotides of apolynucleotide defined in a), b), c), d) or e).

Nucleotide sequence, polynucleotide or nucleic acid is understood tomean, according to the present invention, a double-stranded DNA, asingle-stranded DNA and products of transcription of said DNAs.

Percentage identity for the purpose of the present invention isunderstood to mean a percentage identity between the bases of twopolynucleotides, this percentage being purely statistical and thedifferences between the two polynucleotides being distributed randomlyand over their entire length.

Hybridization under high stringency conditions means that thetemperature and ionic strength conditions are chosen such that theyallow the hybridization between two complementary DNA fragments to bemaintained.

By way of illustration, high stringency conditions of the hybridizationstep for the purposes of defining the polynucleotide fragments describedabove are advantageously the following:

the hybridization is carried out at a temperature which is preferably65° C., in the presence of buffer marketed under the name rapid-hybbuffer by Amersham (RPN 1636) and 100 μg/ml of E. coli DNA.

The washing steps may, for example, be the following:

two washes of 10 min, preferably at 65° C., in a 2×SSC buffer and 0.1%SDS;

two washes of 10 min, preferably at 65° C., in a 1×SSC buffer and 0.1%SDS;

one wash of 10 min, preferably at 65° C., in a 0.1×SSC buffer and 0.1%SDS.

1×SSC corresponds to 0.15 M NaCl and 0.05 M Na citrate and a 1× Denhardtsolution corresponds to 0.02% Ficoll, 0.02% of polyvinylpyrrolidone and0.02% of bovine serum albumin.

Advantageously, a nucleotide fragment corresponding to the precedingdefinition will have at least 8 nucleotides, preferably at least 12nucleotides, and still more preferably at least 20 consecutivenucleotides of the sequence from which it is derived. The highstringency hybridization conditions described above for a polynucleotidehaving a size of about 200 bases will be adjusted by persons skilled inthe art for oligonucleotides with a larger or a smaller size, accordingto the teaching of Sambrook et al., 1989.

For the conditions for using the restriction enzymes with the aim ofobtaining nucleotide fragments of the polynucleotides according to theinvention, reference will be advantageously made to the manual bySambrook et al., 1989.

Advantageously, a polynucleotide of the invention will contain at leastone sequence comprising the stretch of nucleotides going from thenucleotide at position nt 964 to the nucleotide nt 1234 of thepolynucleotide having the sequence SEQ ID NOS 1, 8, 14, 25, 31, and 33.

The subject of the present invention is a polynucleotide according tothe invention, characterized in that its nucleic sequence hybridizeswith the DNA of a sequence of mycobacteria and preferably with the DNAof a sequence of mycobacteria belonging to the Mycobacteriumtuberculosis complex.

The polynucleotide is encoded by a polynucleotide sequence as describedsupra.

The subject of the present invention is also a polypeptide derived froma mycobacterium, characterized in that it is present only in themycobacteria belonging to the Mycobacterium tuberculosis complex.

The invention also relates to a polypeptide characterized in that itcomprises a polypeptide chosen from:

a) a polypeptide whose amino acid sequence is included in an amino acidsequence chosen from the amino acid sequences SEQ ID NOS 2-7, 9-13,15-24, 26-30, 32, 34, 36-40, 42-45, 47-51, 53-55, 57-61, 63, 65-66, 68,70-71, 73, 75, 77, 79-80, 82-83, 85, 87, 89, 91, 93-95, 97, 99, 101-103,105, 107, 109, 111-112, 120-121, 123-127, 129-132, 134-136, 138,272-273, 140, 142, 144, 146-147, 149, 151, 153, 155, 157, 159, 161,163-164, 166-168, 170-176, 185-188, 190-194, 196-199, 201, 203-205,207-208, 210, 212, 214-216, 218-219, 221-224, 226-227, 923-925, 229-237,239-245, 247-249, 251-254, 256-257, 259, 261, 263-267, 269-271, 275-277,279, 281, 283, 285, 287, 289, 291-296, 298-309, 311-316, 318-320, 322,324, 326, 328-330, 332, 334, 336, 338, 340-345, 348-352, 354-356, 358,360, 926-930, 362-363, 365-367, 369-370, 372-373, 375-379, 381-382, 384,386, 388, 390-392, 394, 396, 398, 400-402, 404, 406, 408-409, 411,413-418, 420, 422-425, 427-428, 430, 432, 434-436, 438-440, 442-446,448-451, 453-455, 457-458, 460, 462, 464-468, 470-471, 473, 475,477-481, 483-484, 486, 488, 490-494, 496, 498-500, 502-504, 506-509,511-515, 517-518, 520-521, 523-527, 531-533, 535-536, 538-542, 543, 545,547-549, 551, 553, 555, 557, 559-563, 565-568, 570, 572, 574-575,577-579, 581-583, 585, 587, 589, 591-593, 595, 597, 599, 601-603,605-607, 609, 611, 613, 615, 617, 619, 621, 623, 625, 627-628, 630, 632,634, 636-639, 641-646, 648, 650, 652, 654-656, 658-659, 661, 663, 665,931-933, 667-668, 670-673, 675, 677, 679-682, 684-685, 687-690, 692,694, 696, 698-701, 703-716, 718-727, 729-732, 734-735, 737-738, 740,742, 744-745, 747-751, 753-754, 756, 758, 760, 762-763, 765-766, 768,770, 772-783, 785-793, 795-804, 806, 808, 810, 812, 814-816, 818-820,822, 824, 826, 828-830, 832, 834, 836, 838, 840-841, 843, 845, 847,849-863, 865-877, 879-882, 884, 886, 888-894, 896-900, 902-906, 908, and910,

b) a polypeptide which is homologous to the polypeptide defined in a),

c) a fragment of at least 5 amino acids of a polypeptide defined in a)or b),

d) a biologically active fragment of a polypeptide defined in a), b) orc).

The subject of the present invention is also a polypeptide whose aminoacid sequence is included in the amino acid sequences SEQ ID NOS: SEQ IDNOS: 2-7, 9-13, 15-24, 26-30, 32, 34, 36-40, or a polypeptide having theamino acid sequence SEQ ID NO: 543.

Homologous polypeptide will be understood to designate the polypeptidesexhibiting, relative to the natural polypeptide according to theinvention such as the polypeptide DP428, certain modifications such asin particular a deletion, addition or substitution of at least one aminoacid, a truncation, an extension, a chimeric fusion, and/or a mutation.Among the homologous polypeptides, those whose amino acid sequenceexhibits at least 30%, preferably 50%, homology with the amino acidsequences of the polypeptides according to the invention are preferred.In the case of a substitution, one or more consecutive or nonconsecutiveamino acids are replaced with “equivalent” amino acids. The expression“equivalent” amino acid is intended here to designate any amino acidcapable of being substituted for one of the amino acids of the parentstructure without, however, essentially modifying the immunogenicproperties of the corresponding peptides. In other words, the equivalentamino acids will be those which allow the production of a polypeptidehaving a modified sequence which allows the induction in vivo ofantibodies or of cells capable of recognizing the polypeptide whoseamino acid sequence is included in the amino acid sequence of thepolypeptide according to the invention, such as the amino acid sequencesSEQ ID NOS: 2-7, 9-13, 15-24, 26-30, 32, 34, 36-40, or a polypeptidehaving the amino acid sequence SEQ ID NO: 543 (polypeptide DP428) or oneof its above-defined fragments.

These equivalent aminoacyls may be determined either based on theirstructural homology with the aminoacyls for which they are substituted,or on the results of cross-immunogenicity assays to which the differentpeptides are capable of giving rise.

By way of example, there may be mentioned the possibilities ofsubstitutions which are capable of being made without resulting in aprofound modification of the immunogenicity of the correspondingmodified peptides, the replacements, for example, of leucine with valineor isoleucine, of aspartic acid with glutamic acid, of glutamine withasparagine and of arginine with lysine, and the like, it being possibleto naturally envisage the reverse substitutions under the sameconditions.

Biologically active fragment will be understood to designate inparticular a fragment of an amino acid sequence of a polypeptide havingat least one of the characteristics of the polypeptides according to theinvention, in particular in that it is:

capable of being exported and/or secreted by a mycobacterium, and/or ofbeing induced or repressed during infection with the mycobacterium;and/or

capable of inducing, repressing or modulating, directly or indirectly, amycobacterium virulence factor; and/or

capable of inducing an immunogenicity reaction directed againstmycobacteria; and/or

capable of being recognized by an antibody which is specific formycobacterium.

Polypeptide fragment is understood to designate a polypeptide comprisinga minimum of 5 amino acids, preferably 10 amino acids and 15 aminoacids.

A polypeptide of the invention, or one of its fragments, as definedabove, is capable of being specifically recognized by the antibodiespresent in the serum of patients infected by mycobacteria and preferablybacteria belonging to the Mycobacterium tuberculosis complex or by cellsof the infected host.

Thus, forming part of the invention are the fragments of the polypeptidewhose amino acid sequence is included in the amino acid sequence of apolypeptide according to the invention, such as the amino acid sequencesSEQ ID NOS: 2-7, 9-13, 15-24, 26-30, 32, 34, 3640, or a polypeptidehaving an amino acid sequence SEQ ID NO: 543, which may be obtained bycleavage of said polypeptide with a proteolytic enzyme, such as trypsinor chymotrypsin or collagenase, or with a chemical reagent, such ascyanogen bromide (CNBr) or alternatively by placing a polypeptideaccording to the invention such as the polypeptide DP428 in a veryacidic environment, for example at pH 2.5. Preferred peptide fragmentsaccording to the invention, for use in diagnosis or in vaccination, arethe fragments contained in regions of a polypeptide according to theinvention such as the polypeptide DP428 which are capable of beingnaturally exposed to the solvent and to thus exhibit substantialimmunogenicity properties. Such peptide fragments may be prepared eitherby chemical synthesis, from hosts transformed with an expression vectoraccording to the invention containing a nucleic acid allowing theexpression of said fragments, placed under the control of appropriateregulatory and/or expression elements or alternatively by chemical orenzymatic cleavage.

Analysis of the hydrophilicity of the polypeptide DP428 was carried outwith the aid of the DNA Strider™ software (marketed by CEA Saclay) onthe basis of a calculation of the hydrophilic character of the regionencoding DP428 of SEQ ID NO: 543. The results of this analysis arepresented in FIG. 54 where the hydrophilicity index is detailed, foreach of the amino acids (AA) having a defined position in SEQ ID NO:543. The higher the hydrophilicity index, the more the amino acidconsidered is likely to be exposed to the solvent in the nativemolecule, and is subsequently likely to exhibit a high degree ofantigenicity. Thus, a stretch of at least seven amino acids possessing ahigh hydrophilicity index (>0.3) can constitute the basis of thestructure of an immunogenic candidate peptide according to the presentinvention.

The cellular immune responses of the host to a polypeptide according tothe invention can be demonstrated according to the techniques describedby Colignon et al., 1996.

From the data of the hydrophilicity map presented in FIG. 54, theinventors were able to define regions of the polypeptide DP428 which arepreferably exposed to the solvent, more particularly the region locatedbetween amino acids 55 and 72 of the sequence SEQ ID NO: 543 and theregion located between amino acids 99 and 107 of SEQ ID NO: 543.

The peptide regions of the polypeptide DP428 which are defined above maybe advantageously used for the production of immunogenic compositions orof vaccine compositions according to the invention.

The polynucleotides characterized in that they encode a polypeptideaccording to the invention also form part of the invention.

The invention also relates to the nucleic acid sequences which can beused as probes or primers, characterized in that said sequences arechosen from the nucleic acid sequences of polynucleotides according tothe invention.

The invention relates, in addition, to the use of a nucleic acidsequence of polynucleotides according to the invention as a probe or aprimer for the detection and/or amplification of a nucleic acidsequence. Among these nucleic acid sequences according to the inventionwhich can be used as probes or primers there are preferred the nucleicacid sequences of the invention, characterized in that said sequencesare sequences, or their complementary sequence, between the nucleotideat position nt 964 and the nucleotide at position nt 1234, endsincluded, of the sequence SEQ ID NOS: 1, 8, 14, 25, 31, and 33.

Among the polynucleotides according to the invention which can be usedas nucleotide primers, the polynucleotides having the sequences SEQ IDNO: 528 and SEQ ID NO: 529 are particularly preferred.

The polynucleotides according to the invention may thus be used toselect nucleotide primers, in particular for the PCR technique (Erlich,1989; Innis et al., 1990, and, Rolfs et al., 1991).

This technique requires the choice of oligonucleotide pairs flanking thefragment which has to be amplified. Reference may be made, for example,to the technique described in American patent U.S. Pat. No. 4,683,202.These oligodeoxyribonucleotide or oligoribonucleotide primersadvantageously have a length of at least 8 nucleotides, preferably of atleast 12 nucleotides, and still more preferably of at least 20nucleotides. Primers having a length of between 8 and 30 and preferably12 and 22 nucleotides will be preferred in particular. One of the twoprimers is complementary to the (+) strand [forward primer] of thetemplate and the other primer is complementary to the (−) strand[backward primer]. It is important that the primers do not possess asecondary structure or sequences which are complementary to each other.Moreover, the length and the sequence of each primer should be chosen sothat the primers do not hybridize with other nucleic acids fromprokaryotic or eukaryotic cells, in particular with the nucleic acidsfrom other pathogenic mycobacteria, or with human DNA or RNA which maypossibly contaminate the biological sample.

The results presented in FIG. 51 show that the sequence encoding thepolypeptide DP428 (SEQ ID NO: 543) is not found in the DNAs of M.fortuitum, M. simiae, M. avium, M. chelonae, M. flavescens, M. gordonae,M. marinum and M. kansasii.

The amplified fragments may be identified after agarose orpolyacrylamide gel electrophoresis or after capillary electrophoresis,or alternatively after a chromatographic technique (gel filtration,hydrophobic chromatography or ion-exchange chromatography). Thespecificity of the amplification may be checked by molecularhybridization using, as probes, the nucleotide sequences ofpolynucleotides of the invention, plasmids containing these sequences ortheir amplification products.

The amplified nucleotide fragments may be used as reagents inhybridization reactions in order to detect the presence, in a biologicalsample, of a target nucleic acid having a sequence which iscomplementary to that of said amplified nucleotide fragments.

Among the polynucleotides according to the invention which can be usedas nucleotide probes, the polynucleotide fragment comprising thesequence between the nucleotide at position nt 964 and the nucleotide atposition nt 1234, ends included, of the sequence SEQ ID NO: 1 is mostparticularly preferred.

These probes and amplicons may be labeled or otherwise with radioactiveelements or with nonradioactive molecules such as enzymes or fluorescentelements.

The invention also relates to the nucleotide fragments which are capableof being obtained by amplification with the aid of primers according tothe invention.

Other techniques for the amplification of the target nucleic acid may beadvantageously used as alternatives to PCR.

The SDA (Strand Displacement Amplification) technique (Walker et al.,1992) is an isothermic amplification technique whose principle is basedon the capacity of a restriction enzyme to cut one of the two strands ofits recognition site which is in the form of a hemiphosphorothioate andon the property of a DNA polymerase to initiate the synthesis of a newDNA strand from the 3′OH end created by the restriction enzyme and todisplace the strand previously synthesized which is present downstream.

The polynucleotides of the invention, in particular the primersaccording to the invention, may also be used in other methods ofamplifying a target nucleic acid, such as:

the TAS (Transcription-based Amplification System) technique describedby Kwoh et al. in 1989;

the 3SR (Self-Sustained Sequence Replication) technique described byGuatelli et al. in 1990;

the NASBA (Nucleic Acid Sequence Based Amplification) techniquedescribed by Kievitis et al. in 1991;

the TMA (Transcription Mediated Amplification) technique.

The polynucleotides of the invention may also be used in techniques forthe amplification or modification of the nucleic acid serving as probe,such as:

the LCR (Ligase Chain Reaction) technique described by Landegren et al.in 1988 and improved by Barany et al. in 1991, which uses a heat-stableligase;

the RCR (Repair Chain Reaction) technique described by Segev in 1992;

the CPR (Cycling Probe Reaction) technique described by Duck et al. in1990;

the Q-beta-replicase amplification technique described by Miele et al.in 1983 and improved in particular by Chu et al. in 1986, Lizardi et al.in 1988 and then by Burg et al. as well as Stone et al. in 1996.

In the case where the target polynucleotide to be detected is an RNA,for example an mRNA, a reverse transcriptase-type enzyme will beadvantageously used, prior to using an amplification reaction using theprimers according to the invention or to the use of a method ofdetection using the probes of the invention, in order to obtain a cDNAfrom the RNA contained in the biological sample. The cDNA obtained willthen serve as target for the primers or probes used in the method ofamplification or detection according to the invention.

The detection probe will be chosen so that it hybridizes with theamplicon generated. Such a detection probe will advantageously have asequence of at least 12 nucleotides in particular of at least 15nucleotides and preferably at least 200 nucleotides.

The nucleotide probes according to the invention are capable ofdetecting mycobacteria and preferably bacteria belonging to theMycobacterium tuberculosis complex, more particularly because of thefact that these mycobacteria possess in their genome at least one copyof polynucleotides according to the invention. These probes according tothe invention are capable, for example, of hybridizing with thenucleotide sequence of a polypeptide according to the invention, moreparticularly any oligonucleotide hybridizing with the sequences SEQ IDNOS 1, 8, 14, 25, 31, and 33 encoding the M. tuberculosis polypeptideDP428 and not exhibiting a cross-hybridization reaction or anamplification reaction (PCR) with, for example, sequences present inmycobacteria not belonging to the Mycobacterium tuberculosis complex.The nucleotide probes according to the invention hybridize specificallywith a DNA or RNA molecule of a polynucleotide according to theinvention, under high stringency hybridization conditions as given inthe form of an example above.

The nonlabeled sequences may be used directly as probes. However, thesequences are generally labeled with a radioactive element (³²P, ³⁵S,³H, ¹²⁵I) or with a nonradioactive molecule (biotin,acetylaminofluorene, digoxigenin, 5-bromodeoxyuridine, fluorescein) inorder to obtain probes which can be used for many applications.

Examples of nonradioactive labelings of probes are described, forexample, in French patent No. 78,10975 or by Urdea et al. or bySanchez-Pescador et al. in 1988.

In the latter case, it will also be possible to use one of the labelingmethods described in patents FR 2,422,956 and FR 2,518,755. Thehybridization technique may be carried out in various ways (Matthews etal., 1988). The most common method consists in immobilizing the nucleicacid extracted from mycobacterial cells onto a support (such asnitrocellulose, nylon, polystyrene) and in incubating, underwell-defined conditions, the immobilized target nucleic acid with theprobe. After hybridization, the excess probe is removed and the hybridmolecules formed are detected by the appropriate method (measurement ofthe radioactivity, of the fluorescence or of the enzymatic activitylinked to the probe).

Advantageously, the labeled nucleotide probes according to the inventionmay have a structure such that they make amplification of theradioactive or nonradioactive signal possible. An amplification systemcorresponding to the above definition will comprise detection probes inthe form of a branched, ramified DNA such as those described by Urdea etal. in 1991. According to this technique, several types of probe, inparticular a capture probe, to immobilize the target DNA or RNA to asupport, and a detection probe will be advantageously used. Thedetection probe binds a “branched” DNA having a ramified structure. Thebranched DNA in turn is capable of binding oligonucleotide probes whichare themselves coupled to alkaline phosphatase molecules. The activityof this enzyme is then detected using a chemiluminescent substrate, forexample a derivative of dioxethane phosphate.

According to another advantageous embodiment of the nucleic probesaccording to the invention, they can be covalently or noncovalentlyimmobilized on a support and used as capture probes. In this case, aprobe termed “capture probe” is immobilized on a support and serves tocapture, through specific hybridization, the target nucleic acidobtained from the biological sample to be tested. If necessary, thesolid support is separated from the sample and the duplex formed betweenthe capture probe and the target nucleic acid is then detected by meansof a second probe termed “detection probe” which is labeled with aneasily detectable element.

The oligonucleotide fragments may be obtained from the sequencesaccording to the invention by cleavage with restriction enzymes or bychemical synthesis according to conventional methods, for exampleaccording to the method described in European patent No. EP-0,305,929(Millipore Corporation) or by other methods.

An appropriate method of preparing the nucleic acids of the inventioncomprising a maximum of 200 nucleotides (or 200 bp in the case ofdouble-stranded nucleic acids) comprises the following steps:

synthesis of DNA using the automated beta-cyanethylphosphoramiditemethod described in 1986,

cloning of the nucleic acids thus obtained into an appropriate vectorand recovery of the nucleic acids by hybridization with an appropriateprobe.

A method of preparation, by the chemical route, of nucleic acidsaccording to the invention having a length greater than 200 nucleotides(or 200 bp in the case of double-stranded nucleic acids) comprises thefollowing steps:

assembly of chemically synthesized oligonucleotides, provided at theirend with different restriction sites, whose sequences are compatiblewith the stretch of amino acids of the natural polypeptide according tothe principle described in 1983,

cloning of the nucleic acids thus obtained into an appropriate vectorand recovery of the desired nucleic acids by hybridization with anappropriate probe.

The nucleotide probes used for recovering the desired nucleic acids inthe abovementioned methods generally consist of 8 to 200 nucleotides ofthe polypeptide sequence according to the invention and are capable ofhybridizing with the nucleic acid tested for under the hybridizationconditions defined above. The synthesis of these probes may be carriedout according to the automated beta-cyanethylphosphoramidite methoddescribed in 1986.

The oligonucleotide probes according to the invention may be used in adetection device comprising an oligonucleotide array library. Anexemplary embodiment of such an array library may consist of an array ofprobe oligonucleotides which are attached to a support, the sequence ofeach probe of a given length being situated with a shift of one or morebases relative to the preceding probe, each of the probes of the arrayarrangement thus being complementary to a distinct sequence of thetarget DNA or RNA to be detected and each probe of known sequence beingattached at a predetermined position of the support. The target sequenceto be detected may be advantageously labeled radioactively ornonradioactively. When the labeled target sequence is brought intocontact with the array device, it forms hybrids with the probes havingcomplementary sequences. A nuclease treatment, followed by washing,makes it possible to remove the probe-target sequence hybrids which arenot perfectly complementary. Because of the precise knowledge of thesequence of a probe at a given position of the array, it is thenpossible to deduce the nucleotide sequence of the target DNA or RNAsequence. This technique is particularly effective when matrices ofoligonucleotide probes of a large size are used.

An alternative to the use of a labeled target sequence may consist ofusing a support allowing a “bioelectronic” detection of thehybridization of the target sequence with the probes of the arraysupport, when said support consists of or comprises a material capableof acting, for example, as an electron donor at the positions of thearray where a hybrid has been formed. Such an electron-donating materialis for example gold. The detection of the nucleotide sequence of thetarget DNA or RNA is then determined by an electronic device.

An exemplary embodiment of a biosensor, as defined above, is describedin European patent application No. EP-0,721,016 in the name of AffymaxTechnologies N.V. or in American patent No. U.S. Pat. No. 5,202,231 inthe name of Drmanac.

The subject of the invention is also the hybrid polynucleotidesresulting:

either from the formation of a hybrid molecule between an RNA or a DNA(genomic DNA or cDNA) obtained from a biological sample with a probe ora primer according to the invention,

or from the formation of a hybrid molecule between an RNA or a DNA(genomic DNA or cDNA) obtained from a biological sample with anucleotide fragment amplified with the aid of a pair of primersaccording to the invention.

cDNA for the purposes of the invention is understood to mean a DNAmolecule obtained by causing a reverse transcriptase type enzyme to acton an RNA molecule, in particular a messenger RNA (mRNA) molecule,according to the techniques described in Sambrook et al. in 1989.

The subject of the present invention is also a family of recombinantplasmids, characterized in that they contain at least one nucleotidesequence of a polynucleotide according to the invention. According to anadvantageous embodiment of said plasmid it comprises the nucleotidesequences SEQ ID NOS: 1, 8, 14, 25, 31, and 33, or a fragment thereof.

Another subject of the present invention is a vector for the cloning,expression and/or insertion of a sequence, characterized in that itcomprises a nucleotide sequence of a polynucleotide according to theinvention at a site which is not essential for its replication, whereappropriate under the control of regulatory elements capable of playinga role in the expression of the polypeptide DP428, in a given host.

Specific vectors are for example plasmids, phages, cosmids, phagemidsand YACs.

These vectors are useful for transforming host cells so as to clone orexpress the nucleotide sequences of the invention.

The invention also comprises the host cells transformed with a vectoraccording to the invention.

Preferably, the host cells are transformed under conditions allowing theexpression of a recombinant polypeptide according to the invention.

A preferred host cells according to the invention is the E. coli straintransformed with the plasmid pDP428 deposited on 28 Jan. 1997 at theCNCM under the No. I-1818 or transformed with the plasmid pM1C25 whichwas deposited on 4 Aug. 1998 at the CNCM under the No. I-2062 or amycobacterium belonging to a strain of M. tuberculosis, M. bovis or M.africanum potentially possessing all the appropriate regulatory systems.

It is now easy to produce proteins or polypeptides in a relatively largequantity by genetic engineering using, as expression vectors, plasmids,phages or phagemids. All or part of the DP428 gene, or anypolynucleotide according to the invention, may be inserted into anappropriate expression vector in order to produce in vitro a polypeptideaccording to the invention, in particular the polypeptide DP428. Saidpolypeptide may be attached to a microplate in order to develop aserological test intended to search, for diagnostic purposes, for thespecific antibodies in patients suffering tuberculosis.

Thus, the present invention relates to a method of preparing apolypeptide, characterized in that it uses a vector according to theinvention. More particularly, the invention relates to a method ofpreparing a polypeptide of the invention comprising the following steps:

where appropriate, the prior amplification, according to the PCRtechnique, of the quantity of nucleotide sequences encoding saidpolypeptide with the aid of two DNA primers chosen so that one of theseprimers is identical to the first 10 to 25 nucleotides of the nucleotidesequence encoding said polypeptide, while the other primer iscomplementary to the last 10 to nucleotides (or hybridizes with theselast 10 to 25 nucleotides) of said nucleotide sequence, or conversely sothat one of these primers is identical to the last 10 to 25 nucleotidesof said sequence, while the other primer is complementary to the first10 to 25 nucleotides (or hybridizes with the first 10 to nucleotides) ofsaid nucleotide sequence, followed by the introduction of said sequencesthus amplified into an appropriate vector,

the culture, in an appropriate culture medium, of a cellular host whichhas been previously transformed with an appropriate vector containing anucleic acid according to the invention comprising the nucleotidesequence encoding said polypeptide, and

the separation, from said culture medium, of said polypeptide producedby said transformed cellular host.

The subject of the invention is also a polypeptide which is capable ofbeing obtained by a method of the invention as described above.

The peptides according to the invention may also be prepared bytechniques which are conventionally used in the field of peptidesynthesis. This synthesis may be carried out in homogeneous solution orin solid phase.

For example, the technique of synthesis in homogeneous solutiondescribed by Houbenweyl in 1974 will be used.

This method of synthesis consists in successively condensing in pairsthe successive aminoacyls in the required order, or in condensingaminoacyls and fragments formed beforehand and already containingseveral aminoacyls in the appropriate order, or alternatively severalfragments thus prepared beforehand, it being understood that care willbe taken to protect beforehand all the reactive functions carried bythese aminoacyls or fragments, with the exception of the amine functionsof one and the carboxyl functions of the other or vice versa, whichshould normally be involved in the formation of the peptide bonds, inparticular after activation of the carboxyl function, according tomethods well known in peptide synthesis. As a variant, use may be madeof coupling reactions using conventional coupling reagents, of thecarbodiimide type, such as for example1-ethyl-3-(3-dimethylaminopropyl)carbodiimide.

When the aminoacyl used possesses an additional acid function (inparticular in the case of glutamic acid), these functions will beprotected, for example with t-butyl ester groups.

In the case of gradual synthesis, amino acid by amino acid, thesynthesis preferably starts with the condensation of the C-terminalamino acid with the amino acid which corresponds to the neighboringaminoacyl in the desired sequence, and so on, step by step, up to theN-terminal amino acid.

According to another preferred technique of the invention, the onedescribed by Merrifield is used.

To manufacture a peptide chain according to the Merrifield method, useis made of a very porous polymer resin onto which the first C-terminalamino acid of the chain is attached. This amino acid is attached to theresin via its carboxyl group and its amine function is protected, forexample with the t-butyloxycarbonyl group.

When the first C-terminal amino acid is thus attached to the resin, thegroup for protecting the amine function is removed by washing the resinwith an acid.

In the case where the group for protecting the amine function is thet-butyloxycarbonyl group, it may be removed by treating the resin withtrifluoroacetic acid.

The second amino acid which provides the second aminoacyl of the desiredsequence, from the C-terminal aminoacyl residue, is then coupled withthe deprotected amine function of the first C-terminal amino acidattached to the chain. Preferably, the carboxyl function of this secondamino acid is activated, for example with dicyclohexylcarbodiimide, andthe amine function is protected, for example with t-butyloxycarbonyl.

The first portion of the desired peptide chain is thus obtained whichcomprises two amino acids, and whose terminal amine function isprotected. As before, the amine function is deprotected and it is thenpossible to proceed to the attachment of the third aminoacyl, underconditions similar to those for the addition of the second C-terminalamino acid.

The amino acids which will constitute the peptide chain will thus beattached, one after the other, to the amino group, each time deprotectedbeforehand, of the portion of the peptide chain which is already formedand which is attached to the resin.

When the entire desired peptide chain is formed, the groups forprotecting the different amino acids constituting the peptide chain areremoved and the peptide is detached from the resin, for example with theaid of hydrofluoric acid.

Preferably, said polypeptides which are capable of being obtained by amethod of the invention as described above will comprise a regionexposed to the solvent and will have a length of at least 20 aminoacids.

According to another embodiment of the invention, said polypeptides arespecific to mycobacteria of the Mycobacterium tuberculosis complex andare not therefore recognized by antibodies specific for othermycobacterial proteins.

The invention relates, in addition, to hybrid polypeptides having atleast one polypeptide according to the invention and a sequence of apolypeptide capable of inducing an immune response in humans or animals.

Advantageously, the antigenic determinant is such that it is capable ofinducing a humoral and/or cellular response.

Such a determinant may comprise a polypeptide according to theinvention, in glycosylated form, which is used to obtain immunogeniccompositions capable of inducing the synthesis of antibodies directedagainst multiple epitopes. Said glycosylated polypeptides also form partof the invention.

These hybrid molecules may consist in part of a polypeptide-carryingmolecule according to the invention combined with a portion, inparticular an epitope of the diphtheria toxin, the tetanus toxin, ahepatitis B virus surface antigen (patent FR 79 21811), the VP1 antigenof the poliomyelitis virus or any other viral or bacterial toxin orantigen.

Advantageously, said antigenic determinant corresponds to an antigenicdeterminant of immunogenic proteins of 45/47 kD of M. tuberculosis(international application PCT/FR 96/0166), or alternatively which areselected for example from ESAT6 (Harboe et al., 1996, Andersen et al.,1995, and Sorensen et al., 1995) and DES (PCT/FR 97/00923, Gicquel etal.).

A viral antigen, as defined above, will be preferably a hepatitis virussurface or envelope protein, for example the hepatitis B surface proteinin one of its S,S-preS1, S-preS2 or S-preS2-preS1 forms or alternativelya protein of a hepatitis A virus, or of a hepatitis non-A, non-B virus,such as a hepatitis C, E or delta virus.

More particularly, a viral antigen as defined above will be the whole orpart of one of the glycoproteins encoded by the genome of the HIV-1virus (patents GB 8324800, EP 84401834 or EP 85905513) or of the HIV-2virus (EP 87400151), and in particular the whole or part of a proteinselected from gag, pol, nef or env of HIV-1 or HIV-2.

The methods for synthesizing the hybrid molecules include the methodsused in genetic engineering to construct hybrid polynucleotides encodingthe desired polypeptide sequences. Reference may be advantageously made,for example, to the technique for the production of genes encodingfusion proteins described by Minton in 1984.

Said hybrid polynucleotides encoding a hybrid polypeptide as well as thehybrid polypeptides according to the invention characterized in thatthey are recombinant proteins obtained by the expression of said hybridpolynucleotides also form part of the invention.

The polypeptides according to the invention may advantageously be usedin a method for the in vitro detection of antibodies directed againstsaid polypeptides, in particular the polypeptide DP428, and also ofantibodies directed against a bacterium of the Mycobacteriumtuberculosis complex, in a biological sample (biological tissue orfluid) capable of containing them, this method comprising bringing thisbiological sample into contact with a polypeptide according to theinvention under conditions allowing an immunological reaction in vitrobetween said polypeptide and the antibodies which may be present in thebiological sample, and detecting in vitro the antigen-antibody complexeswhich may be formed.

The polypeptides according to the invention may also and advantageouslybe used in a method for the detection of an infection by a bacterium ofthe Mycobacterium tuberculosis complex in a mammal based on the in vitrodetection of a cellular reaction indicating prior sensitization of themammal to said polypeptide such as for example cell proliferation, thesynthesis of proteins such as interferon-gamma. This method for thedetection of an infection by a bacterium of the Mycobacteriumtuberculosis complex in a mammal is characterized in that it comprisesthe following steps:

a) preparation of a biological sample containing cells of said mammal,more particularly cells of the immune system of said mammal and stillmore particularly T cells;

b) incubation of the biological sample of step a) with a polypeptideaccording to the invention;

c) detection of a cellular reaction indicating prior sensitization ofthe mammal to said polypeptide such as for example cell proliferationand/or the synthesis of proteins such as interferon-gamma.

Cell proliferation may be measured, for example, by incorporation of³H-Thymidine.

Also forming part of the invention are the methods for the detection ofa delayed hypersensitivity reaction (DTH), characterized in that theyuse a polypeptide according to the invention.

Preferably, the biological sample consists of a fluid, for example ahuman or animal serum, blood, biopsies, bronchoalveolar fluid or pleuralfluid.

Any conventional procedure may be used to carry out such a detection.

By way of example, a preferred method uses immunoenzymatic proceduressuch as the ELISA, immunofluorescence or radioimmunoassay (RIA)technique and the like.

Thus, the invention also relates to the polypeptides according to theinvention, labeled with the aid of a suitable marker of the enzymatic,fluorescent or radioactive type.

Such methods comprise, for example, the following steps:

deposition of predetermined quantities of a polypeptide compositionaccording to the invention into the wells of a microtiter plate,

introduction into said wells of increasing dilutions of serum or ofanother biological sample as defined above, before being analyzed,

incubation of the microplate,

introduction into the wells of the microtiter plate of labeledantibodies directed against human or animal immunoglobulins, thelabeling of these antibodies having been carried out with the aid of anenzyme selected from those which are capable of hydrolyzing a substratewhile modifying its radiation absorption, at least at a definedwavelength, for example at 550 nm,

detection, by comparing with a control, of the quantity of substratehydrolyzed.

The invention also relates to a box or kit for the in vitro diagnosis ofan infection by a mycobacterium belonging to the Mycobacteriumtuberculosis complex, comprising:

a polypeptide according to the invention,

where appropriate, the reagents for constituting the medium which isappropriate for the immunological or specific reaction,

the reagents allowing the detection of the antigen-antibody complexesproduced by the immunological reaction which may be present in thebiological sample, and the in vitro detection of the antigen-antibodycomplexes which may be formed, it being possible for these reagents toalso carry a marker, or to be capable of being recognized in turn by alabeled reagent, more particularly in the case where the polypeptideaccording to the invention is not labeled,

where appropriate, a reference biological sample (negative control) freeof antibodies recognized by a polypeptide according to the invention,

where appropriate, a reference biological sample (positive control)containing a predetermined quantity of antibodies recognized by apolypeptide according to the invention.

The polypeptides according to the invention make it possible to preparemonoclonal or polyclonal antibodies which are characterized in that theyrecognize specifically the polypeptides according to the invention. Themonoclonal antibodies may be advantageously prepared from hybridomasaccording to the technique described by Kohler and Milstein in 1975. Thepolyclonal antibodies may be prepared, for example, by immunizing ananimal, in particular a mouse, with a polypeptide according to theinvention combined with an immune response adjuvant, and then purifyingthe specific antibodies contained in the serum of the immunized animalson an affinity column to which the polypeptide which served as antigenhas been attached beforehand. The polyclonal antibodies according to theinvention may also be prepared by purifying an affinity column, to whichthere have been immobilized beforehand a polypeptide according to theinvention, antibodies contained in the serum of patients infected with amycobacterium and preferably a bacterium belonging to the Mycobacteriumtuberculosis complex.

The subject of the invention is also mono- or polyclonal antibodies orfragments thereof, or chimeric antibodies, characterized in that theyare capable of recognizing specifically a polypeptide according to theinvention.

The antibodies of the invention may also be labeled in the same manneras described above for the nucleic probes of the invention, such as alabeling of the enzymatic, fluorescent or radioactive type.

The invention relates, in addition, to a method for the specificdetection of the presence of an antigen of a mycobacterium andpreferably a bacterium of the Mycobacterium tuberculosis complex in abiological sample, characterized in that it comprises the followingsteps:

a) bringing the biological sample (biological tissue or fluid) collectedfrom an individual into contact with a mono- or polyclonal antibodyaccording to the invention, under conditions allowing an immunologicalreaction in vitro between said antibodies and the polypeptides specificto mycobacteria and preferably bacteria of the Mycobacteriumtuberculosis complex which may be present in the biological sample, and

b) detection of the antigen-antibody complex formed.

Also coming within the scope of the invention is a box or kit for the invitro diagnosis, on a biological sample, of the presence of strains ofmycobacteria and preferably of bacteria belonging to the Mycobacteriumtuberculosis complex, preferably M. tuberculosis, characterized in thatit comprises:

a polyclonal or monoclonal antibody according to the invention, labeledwhere appropriate;

where appropriate, a reagent for constituting the medium which isappropriate for carrying out the immunological reaction;

a reagent allowing the detection of the antigen-antibody complexesproduced by the immunological reaction, it being possible for thisreagent to also carry a marker, or to be capable of being recognized inturn by a labeled reagent, more particularly in the case where saidmonoclonal or polyclonal antibody is not labeled;

where appropriate, reagents for carrying out the lysis of the cells ofthe sample tested.

The subject of the present invention is also a method for the detectionand rapid identification of the mycobacteria and preferably of the M.tuberculosis bacteria in a biological sample, characterized in that itcomprises the following steps:

a) isolation of the DNA from the biological sample to be analyzed, orproduction of a cDNA from the RNA of the biological sample;

b) specific amplification of the DNA of mycobacteria and preferably ofbacteria belonging to the Mycobacterium tuberculosis complex with theaid of primers according to the invention;

c) analysis of the products of amplification.

The products of amplification may be analyzed by various methods.

Two methods of analysis are given by way of example below:

agarose gel electrophoretic analysis of the products of amplification.The presence of a DNA fragment which migrates to the expected positionsuggests that the sample analyzed contained DNA of mycobacteriabelonging to the tuberculosis complex, or

analysis by the molecular hybridization technique using a nucleic probeaccording to the invention. This probe will be advantageously labeledwith a nonradioactive (cold probe) or radioactive element.

For the purposes of the present invention, “DNA of the biologicalsample” or “DNA contained in the biological sample” is understood tomean either the DNA present in the biological sample considered, or thecDNA obtained after the action of a reverse transcriptase-type enzyme onthe RNA present in said biological sample.

Another method of the present invention allows the detection of aninfection by a mycobacterium and preferably a bacterium of theMycobacterium tuberculosis complex in a mammal. This method comprisesthe following steps:

a) preparation of a biological sample containing cells of said mammal,more particularly cells of the immune system of said mammal and stillmore particularly T cells;

b) incubation of the biological sample of step a) with a polypeptideaccording to the invention;

c) detection of a cellular reaction indicating prior sensitization ofthe mammal to said polypeptide in particular cell proliferation and/orthe synthesis of proteins such as interferon-gamma;

d) detection of a reaction of delayed hypersensitivity or ofsensitization of the mammal to said polypeptide.

This method of detection is an intradermal method which is described forexample by M. J. Elhay et al. (1988) Infection and Immunity, 66(7):3454-3456.

Another aim of the present invention consists in a method for thedetection of the mycobacteria and preferably the bacteria belonging tothe Mycobacterium tuberculosis complex in a biological sample,characterized in that it comprises the following steps:

a) bringing an oligonucleotide probe according to the invention intocontact with a biological sample, the DNA contained in the biologicalsample, or the cDNA obtained by reverse transcription of the RNA of thebiological sample, having, where appropriate, been made accessible tothe hybridization beforehand, under conditions allowing thehybridization of the probe with the DNA or the cDNA of the mycobacteriaand preferably of the bacteria of the Mycobacterium tuberculosiscomplex;

b) detection of the hybrid formed between the oligonucleotide probe andthe DNA of the biological sample.

The invention also relates to a method for the detection of themycobacteria and preferably of the bacteria belonging to theMycobacterium tuberculosis complex in a biological sample, characterizedin that it comprises the following steps:

a) bringing an oligonucleotide probe according to the invention,immobilized on a support, into contact with a biological sample, the DNAof the biological sample having, where appropriate, been made accessibleto the hybridization beforehand, under conditions allowing thehybridization of said probe with the DNA of the mycobacteria andpreferably of the bacteria of the Mycobacterium tuberculosis complex;

b) bringing the hybrid formed between said oligonucleotide probeimmobilized on a support and the DNA contained in the biological sample,where appropriate after removal of the DNA of the biological samplewhich has not hybridized with the probe, into contact with a labeledoligonucleotide probe according to the invention.

According to an advantageous embodiment of the method of detectiondefined above, it is characterized in that, prior to step a), the DNA ofthe biological sample is amplified beforehand with the aid of a pair ofprimers according to the invention.

Another embodiment of the method of detection according to the inventionconsists in a method for the detection of the presence of themycobacteria and preferably the bacteria belonging to the Mycobacteriumtuberculosis complex in a biological sample, characterized in that itcomprises the following steps:

a) bringing the biological sample into contact with a pair of primersaccording to the invention, the DNA contained in the sample having been,where appropriate, made accessible to hybridization beforehand, underconditions allowing hybridization of said primers with the DNA of themycobacteria and preferably of the bacteria of the Mycobacteriumtuberculosis complex;

b) amplification of the DNA of a mycobacterium and preferably of abacterium of the Mycobacterium tuberculosis complex;

c) detection of the amplification of the DNA fragments corresponding tothe fragment flanked by the primers, for example by gel electrophoresisor by means of an oligonucleotide probe according to the invention.

A subject of the invention is also a method for the detection of thepresence of the mycobacteria and preferably the bacteria belonging tothe Mycobacterium tuberculosis complex in a biological sample by stranddisplacement, characterized in that it comprises the following steps:

a) bringing the biological sample into contact with two pairs of primersaccording to the invention specifically intended for amplification ofthe SDA type described above, the DNA content in the sample having been,where appropriate, made accessible to hybridization beforehand, underconditions allowing hybridization of the primers with the DNA of themycobacteria and preferably the bacteria of the Mycobacteriumtuberculosis complex;

b) amplification of the DNA of the mycobacteria and preferably of thebacteria of the Mycobacterium tuberculosis complex;

c) detection of the amplification of DNA fragments corresponding to thefragment flanked by the primers, for example by gel electrophoresis orby means of an oligonucleotide probe according to the invention.

The invention also relates to a box or kit for carrying out the methoddescribed above, intended for the detection of the presence of themycobacteria and preferably the bacteria of the Mycobacteriumtuberculosis complex in a biological sample, characterized in that itcomprises the following components:

a) an oligonucleotide probe according to the invention;

b) the reagents necessary for carrying out a hybridization reaction;

c) where appropriate, a pair of primers according to the invention aswell as the reagents necessary for a reaction of amplification of theDNA (genomic DNA, plasmid DNA or cDNA) of mycobacteria and preferably ofbacteria of the Mycobacterium tuberculosis complex.

The subject of the invention is also a kit or box for the detection ofthe presence of the mycobacteria and preferably the bacteria of theMycobacterium tuberculosis complex in a biological sample, characterizedin that it comprises the following components:

a) an oligonucleotide probe, termed capture probe, according to theinvention;

b) an oligonucleotide probe, termed revealing probe, according to theinvention;

c) where appropriate, a pair of primers according to the invention aswell as the reagents necessary for a reaction of amplification of theDNA of mycobacteria and preferably of bacteria of the Mycobacteriumtuberculosis complex.

The invention also relates to a kit or box for the amplification of theDNA of the mycobacteria and preferably the bacteria of the Mycobacteriumtuberculosis complex present in a biological sample, characterized inthat it comprises the following components:

a) a pair of primers according to the invention;

b) the reagents necessary for carrying out a DNA amplification reaction;

c) optionally, a component which makes it possible to verify thesequence of the amplified fragment, more particularly an oligonucleotideprobe according to the invention.

Another subject of the present invention relates to an immunogeniccomposition, characterized in that it comprises a polypeptide accordingto the invention.

Another immunogenic composition according to the invention ischaracterized in that it comprises one or more polypeptides according tothe invention and/or one or more hybrid polypeptides according to theinvention.

According to an advantageous embodiment, the above-defined immunogeniccomposition constitutes a vaccine when it is provided in combinationwith a pharmaceutically acceptable vehicle and optionally one or moreimmunity adjuvants such as alum or a representative of the family ofmuramyl peptides or alternatively incomplete Freund's adjuvant.

Various types of vaccine are currently available for protecting humansagainst infectious diseases: attenuated live microorganisms (M.bovis-BCG for tuberculosis), inactivated microorganisms (influenzavirus), acellular extracts (Bordetella pertussis for whooping cough),recombinant proteins (hepatitis B virus surface antigen),polysaccharides (pneumococci). Experiments are being carried out onvaccines prepared from synthetic peptides or genetically modifiedmicroorganisms expressing heterologous antigens. More recently still,recombinant plasmid DNAs carrying genes encoding protective antigenshave been proposed as an alternative vaccine strategy. This type ofvaccination is carried out with a specific plasmid which is derived froman E. coli plasmid which does not replicate in vivo and which encodesonly the vaccinal protein. The principal functional components of thisplasmid are: a strong promoter allowing expression in eukaryotic cells(for example that of CMV), an appropriate cloning site for inserting thegene of interest, a termination-polyadenylation sequence, a prokaryoticreplication origin for producing the recombinant plasmid in vitro and aselectable marker (for example the ampicillin-resistance gene) forfacilitating the selection of the bacteria which contain the plasmid.Animals were immunized by simply injecting the naked plasmid DNA intothe muscle. This technique leads to the expression of the vaccinalprotein in situ and to an immune response in particular of the cellulartype (CTL) and of the humoral type (antibody). This double induction ofthe immune response is one of the main advantages of the vaccinationtechnique with naked DNA. Huygen et al. (1996) and Tascon et al. (1996)succeeded in obtaining a degree of protection against M. tuberculosis byinjecting recombinant plasmids containing M. leprae genes (hsp65, 36 kDapra) as inserts. M. leprae is the agent responsible for leprosy. The useof an insert specific to M. tuberculosis such as, for example, the wholeor part of the DP428 gene, which is the subject of the presentinvention, would probably lead to a better protection againsttuberculosis. The whole or part of the DP428 gene, or any polynucleotideaccording to the invention, can be easily inserted into the plasmidvectors V1J (Montgomery et al., 1993), pcDNA3 (Invitrogen, R & DSystems) or pcDNA1/Neo (Invitrogen) which possess the necessarycharacteristics for a vaccinal use.

The invention thus relates to a vaccine, characterized in that itcomprises one or more polypeptides according to the invention and/or oneor more hybrid polypeptides according to the invention as previouslydefined, in combination with a pharmaceutically compatible vehicle and,where appropriate, one or more appropriate immunity adjuvants.

The invention also relates to a vaccine composition intended for theimmunization of humans or animals against a bacterial or viralinfection, such as tuberculosis or hepatitis, characterized in that itcomprises one or more hybrid polypeptides as previously defined incombination with a pharmaceutically compatible vehicle and, whereappropriate, one or more immunity adjuvants.

Advantageously, in the case of a protein which is a hybrid between apolypeptide according to the invention and the hepatitis B surfaceantigen, the vaccine composition will be administered, in humans, in anamount of 0.1 to 1 μg of purified hybrid protein per kilogram of theweight of the patient, preferably 0.2 to 0.5 μg/kg of the weight of thepatient, for a dose intended for a given administration. In the case ofpatients suffering from disorders of the immune system, in particularimmunosuppressed patients, each injected dose will preferably containhalf of the quantity, by weight, of the hybrid protein contained in adose intended for a patient not suffering from immune system disorders.

Preferably, the vaccine composition will be administered several times,spread out over time, by the intradermal or subcutaneous route. By wayof example, three doses as defined above will be administered,respectively, to the patient at time t0, at time t0+1 month and at timet0+1 year.

Alternatively, three doses will be administered, respectively, to thepatient at time t0, at time t0+1 month and at time t0+6 months.

In mice, in which a weight dose of the vaccine composition comparable tothe dose used in humans is administered, the antibody reaction is testedby collecting serum followed by a study of the formation of a complexbetween the antibodies present in the serum and the antigen of thevaccine composition, according to the customary techniques.

The invention also relates to an immunogenic composition characterizedin that it comprises a polynucleotide or an expression vector accordingto the invention, in combination with a vehicle allowing itsadministration to humans or animals.

The subject of the invention is also a vaccine intended for immunizingagainst a bacterial or viral infection, such as tuberculosis orhepatitis, characterized in that it comprises a polynucleotide or anexpression vector according to the invention, in combination with apharmaceutically acceptable vehicle.

Such immunogenic or vaccine compositions are in particular described ininternational application No. WO 90/11092 (Vical Inc.) and also ininternational application No. WO 95/11307 (Institut Pasteur).

The constituent polynucleotide of the immunogenic composition or of thevaccine composition according to the invention may be injected into thehost after having been coupled with compounds which promote thepenetration of this polynucleotide into the cell or its transport to thecell nucleus. The resulting conjugates may be encapsulated into polymermicroparticles, as described in international application No. WO94/27238 (Medisorb Technologies International).

According to another embodiment of the immunogenic and/or vaccinecomposition according to the invention, the polynucleotide, preferably aDNA, is complexed with DEAE-dextran (Pagano et al., 1967) or withnuclear proteins (Kaneda et al., 1989), with lipids (Felgner et al.,1987) or encapsulated into liposomes (Fraley et al., 1980).

According to yet another advantageous embodiment of the immunogenicand/or vaccine composition according to the invention, thepolynucleotide according to the invention may be introduced in the formof a gel facilitating its transfection into cells. Such a composition ingel form may be a poly-L-lysine and lactose complex, as described byMidoux in 1993, or Poloxamer 407™, as described by Pastore in 1994. Thepolynucleotide or the vector according to the invention may also be insuspension in a buffer solution or may be combined with liposomes.

Advantageously, such a vaccine will be prepared in accordance with thetechnique described by Tacson et al. or Huygen et al. in 1996 or inaccordance with the technique described by Davis et al. in internationalapplication No. WO 95/11307 (Whalen et al.).

Such a vaccine will be advantageously prepared in the form of acomposition containing a vector according to the invention, placed underthe control of regulatory elements allowing its expression in humans oranimals.

To produce such a vaccine, the polynucleotide according to the inventionis first of all subcloned into an appropriate expression vector,particularly an expression vector containing regulatory and expressionsignals recognized by the enzymes in eukaryotic cells and alsocontaining a replication origin which is active in prokaryotes, forexample in E. coli, which allows its prior amplification. The purifiedrecombinant plasmid obtained is then injected into the host, for exampleby the intramuscular route.

It will be possible, for example, to use as vector for expressing invivo the antigen of interest the plasmid pcDNA3 or the plasmidpcDNA1/neo, both marketed by Invitrogen (R&D Systems, Abingdon, UnitedKingdom). It is also possible to use the plasmid V1Jns.tPA described byShiver et al. in 1995.

Such a vaccine will advantageously comprise, in addition to therecombinant vector, a saline solution, for example a sodium chloridesolution.

A vaccine composition as defined above will be, for example,administered by the parenteral route or by the intramuscular route.

The present invention also relates to a vaccine characterized in that itcontains one or more nucleotide sequences according to the inventionand/or one or more polynucleotides as mentioned above in combinationwith a pharmaceutically compatible vehicle and, where appropriate, oneor more appropriate immunity adjuvants.

Another aspect relates to a method of screening molecules capable ofinhibiting the growth of mycobacteria or the maintenance of mycobacteriain a host, characterized in that said molecules block the synthesis orthe function of the polypeptides encoded by a nucleotide sequenceaccording to the invention or by a polynucleotide as described supra.

In said method of screening, the molecules may be anti-messengers or mayinduce the synthesis of anti-messengers.

The present invention also relates to molecules capable of inhibitingthe growth of mycobacteria or the maintenance of mycobacteria in a host,characterized in that said molecules are synthesized based on thestructure of the polypeptides encoded by a nucleotide sequence accordingto the invention or by a polynucleotide as described supra.

Other characteristics and advantages of the invention appear in thefollowing examples and figures:

FIGURES

The FIG. 1 Series:

The FIG. 1 series illustrates the series of nucleotide sequences SEQ IDNOS: 1, 8, 14, 25, 31, and 33 corresponding to the insert of the vectorpDP428 (deposited at the CNCM under the No. I-1818) and the series ofamino acid sequences SEQ ID NOS: 2-7, 9-13, 15-24, 26-30, 32, 34 of thepolypeptides encoded by the series of nucleotide sequences SEQ ID NOS:1, 8, 14, 25, 31, and 33.

FIG. 2:

Illustrates the nucleotide sequence SEQ ID NO: 35 corresponding to theregion including the gene encoding the polypeptide DP428 (regionunderlined). Both the ATG and GTG codons for initiation of translationwere taken into account in this figure. The figure shows that thepolypeptide DP428 is probably part of an operon comprising at leastthree genes. The double-boxed region probably includes the promoterregions.

The single-boxed region corresponds to the motif LPISG (SEQ ID NO: 934)which resembles the motif LPXTG (SEQ ID NO: 935) described inGram-positive bacteria as allowing anchorage to peptidoglycans.

The FIG. 3 Series:

The FIG. 3 series represents the series of nucleotide sequences SEQ IDNOS: 41, 46, 52 corresponding to the insert of the vector p6D7(deposited at the CNCM under the No. I-1814) and the series of aminoacid sequences SEQ ID NOS: 42-45, 47-51, and 53-55.

The FIG. 4 Series:

The FIG. 4 series represents the series of nucleotide sequences SEQ IDNOS: 56, 62, 64, 67, 69, 72, 74, 76, 78, 81, 84, and 86 corresponding tothe insert of the vector p5A3 (deposited at the CNCM under the No.I-1815) and the series of amino acid sequences SEQ ID NOS: 57-61, 63,65-66, 68, 70-71, 73, 75, 77, 79-80, 82-83, 85, and 87.

The FIG. 5 Series:

The FIG. 5 series represents the series of nucleotide sequences SEQ IDNOS: 88, 90, 92, 96, 98, 100, 104, 106, and 108 corresponding to theinsert of the vector p5F6 (deposited at the CNCM under the No. I-1816)and the series of amino acid sequences SEQ ID NOS: 93-95, 97, 99,101-103, 105, 107, and 109.

The FIG. 6 Series:

The FIG. 6 series represents the series of nucleotide sequences SEQ IDNOS: 110, 113, and 119 corresponding to the insert of the vector p2A29(deposited at the CNCM under the No. I-1817) and the series of aminoacid sequences SEQ ID NOS: 111-112, 114-118, and 120-121.

The FIG. 7 Series:

The FIG. 7 series represents the series of nucleotide sequences SEQ IDNOS: 122, 128, and 133 corresponding to the insert of the vector p5B5(deposited at the CNCM under the No. I-1819) and the series of aminoacid sequences SEQ ID NOS: 123-127, 129-132, and 134-136.

The FIG. 8 Series:

The FIG. 8 series represents the series of nucleotide sequences SEQ IDNOS: 137, 139, 141, 143, 145, 148, 150, 152, 154, and 156 correspondingto the insert of the vector p1C7 (deposited at the CNCM under the No.I-1820) and the series of amino acid sequences SEQ ID NOS: 138, 272-273,140, 142, 144, 146-147, 149, 151, 153, 155, and 157.

The FIG. 9 Series:

The FIG. 9 series represents the series of nucleotide sequences SEQ IDNOS: 158, 160, and 162 corresponding to the insert of the vector p2D7(deposited at the CNCM under the No. I-1821) and the series of aminoacid sequences SEQ ID NOS: 159, 161, 163, and 164.

The FIG. 10 Series:

The FIG. 10 series represents the series of nucleotide sequences SEQ IDNOS: 165, 169, and 177 corresponding to the insert of the vector p1B7(deposited at the CNCM under the No. I-1843) and the series of aminoacid sequences SEQ ID NOS: 166-168, 170-176, 178-183.

The FIG. 11 Series:

The FIG. 11 series represents the series of nucleotide sequences SEQ IDNOS: 184, 189, 195, 200, 202, 206, 209, and 211 and the series of aminoacid sequences SEQ ID NOS: 185-188, 190-194, 196-199, 201, 203-205,207-208, 210, and 212.

The FIG. 12 Series:

The FIG. 12 series represents the series of nucleotide sequences SEQ IDNOS: 213, 217, and 220 and the series of amino acid sequences SEQ IDNOS: 214-216, 218-219, and 221-224.

The FIG. 13 Series:

The FIG. 13 series represents the series of nucleotide sequences SEQ IDNOS: 225, 228, 238, 246, 250, 255, 258, and 260 and the series of aminoacid sequences SEQ ID NOS: 226-227, 923-925, 229-237, 239-245, 247-249,251-254, 256-257, 259, and 261.

The FIG. 14 Series:

The FIG. 14 series represents the series of nucleotide sequences SEQ IDNOS: 262, 268, 274, 278, 280, 282, 284, 286, 288, 297, 290, and 310corresponding to the insert of the vector p5B5 (deposited at the CNCMunder the No. I-1819) and the series of amino acid sequences SEQ ID NOS:263-267, 269-271, 275-277, 279, 281, 283, 285, 287, 289, 291-296,298-309, and 311-316.

The FIG. 15 Series:

The FIG. 15 series represents the series of nucleotide sequences SEQ IDNOS: 317, 321, 323, 325, 327, 331, 333, 335, 337, 339, 346, and 347 andthe series of amino acid sequences SEQ ID NOS: 318-320, 322, 324, 326,328-330, 332, 334, 336, 338, 340-345, and 348-352.

The FIG. 16 Series:

The FIG. 16 series represents the series of nucleotide sequences SEQ IDNOS: 353, 357, and 359 and the series of amino acid sequences SEQ IDNOS: 354-356, 358, 360, and 926-930.

The FIG. 17 Series:

The FIG. 17 series represents the series of nucleotide sequences SEQ IDNOS: 361, 364, 368, 371, 374, 380, 383, and 385 and the series of aminoacid sequences SEQ ID NOS: 362-363, 365-367, 369-370, 372, 373, 375-379,381-382, 384, and 386.

The FIG. 18 Series:

The FIG. 18 series represents the series of nucleotide sequences SEQ IDNOS: 387, 389, 393, 395, 397, 399, 403, and 405 and the series of aminoacid sequences SEQ ID NOS: 388, 390-392, 394, 396, 398, 400-402, 404,and 406.

The FIG. 19 Series:

The FIG. 19 series represents the series of nucleotide sequences SEQ IDNOS: 407, 410, 412, 419, 421, 426, 429, and 431 and the series of aminoacid sequences SEQ ID NOS: 408-409, 411, 413-418, 420, 422-425, 427-428,430, and 432.

The FIG. 20 Series:

The FIG. 20 series represents the series of nucleotide sequences SEQ IDNOS: 433, 437, 441, 447, 452, 456, 459, and 461 corresponding to theinsert of the vector p2A29 (deposited at the CNCM under the No. I-1817)and the series of amino acid sequences SEQ ID NOS: 434-436, 438-440,442-446, 448-451, 453-455, 457-458, 460, and 462.

The FIG. 21 Series:

The FIG. 21 series represents the series of nucleotide sequences SEQ IDNOS: 463, 469, 472, 474, 476, 482, 485, and 487 and the series of aminoacid sequences SEQ ID NOS: 464-468, 470, 471, 473, 475, 477-481,483-484, 486, and 488.

The FIG. 22 Series:

The FIG. 22 series represents the series of nucleotide sequences SEQ IDNOS: 489, 495, and 497 and the series of amino acid sequences SEQ IDNOS: 490-494, 496, and 498-500.

The FIG. 23 Series:

The FIG. 23 series represents the series of nucleotide sequences SEQ IDNOS: 501, 505, and 510 and the series of amino acid sequences SEQ IDNOS: 502-504, 506-509, and 511-515.

The FIG. 24 Series:

The FIG. 24 series represents the series of nucleotide sequences SEQ IDNOS: 516, 519, and 522 and the series of amino acid sequences SEQ IDNOS: 517-518, 520-521, and 523-527.

FIGS. 25 and 26:

FIGS. 25 and 26 illustrate, respectively, the sequences SEQ ID NO: 528and SEQ ID NO: 529 representing a pair of primers used to specificallyamplify, by PCR, the region corresponding to nucleotides 964 to 1234included in the sequence SEQ ID NOS: 1, 8, 14, 25, 31, and 33.

The FIG. 27 Series:

The FIG. 27 series represents the series of nucleotide sequences SEQ IDNOS: 530, 534, and 537 corresponding to the insert of the vector p5A3and the series of amino acid sequences SEQ ID NOS: 531-533, 535-536, and538-542.

FIG. 28:

The amino acid sequence as defined in FIG. 28 represents the amino acidsequence SEQ ID NO: 543 corresponding to the polypeptide DP428.

FIG. 29:

FIG. 29 represents the nucleotide sequence SEQ ID NO: 544 of thecomplete gene encoding the M1C25 protein.

FIG. 30:

FIG. 30 represents the amino acid sequence SEQ ID NO: 545 of the M1C25protein.

The FIG. 31 Series:

The FIG. 31 series represents the series of nucleotide sequences SEQ IDNOS: 546, 550, 552, and 554 and the series of amino acid sequences SEQID NOS: 547-549, 551, 553, and 555.

The FIG. 32 Series:

The FIG. 32 series represents the series of nucleotide sequences SEQ IDNOS: 556, 558, 564, 569, and 571 and the series of amino acid sequencesSEQ ID NOS: 557, 559-563, 565-568, 570, and 572.

The FIG. 33 Series:

The FIG. 33 series represents the series of nucleotide sequences SEQ IDNOS: 573, 576, 580, 584, and 586 and the series of amino acid sequencesSEQ ID NOS: 574-575, 577-579, 581-583, 585, and 587.

The FIG. 34 Series:

The FIG. 34 series represents the series of nucleotide sequences SEQ IDNOS: 588, 590, 594, and 596 and the series of amino acid sequences SEQID NOS: 587, 589, 591-593, 595, and 597.

The FIG. 35 Series:

The FIG. 35 series represents the series of nucleotide sequences SEQ IDNOS: 598, 600, 604, 608, and 610 and the series of amino acid sequencesSEQ ID NOS: 599, 601-603, 605-607, 609, and 611.

The FIG. 36 Series:

The FIG. 36 series represents the series of nucleotide sequences SEQ IDNOS: 612, 614, 616, 618, and 620 and the series of amino acid sequencesSEQ ID NOS: 613, 615, 617, 619, and 621.

The FIG. 37 Series:

The FIG. 37 series represents the series of nucleotide sequences SEQ IDNOS: 622, 624, 626, 629, and 631 and the series of amino acid sequences623, 625, 627-628, 630, and 632.

The FIG. 38 Series:

The FIG. 38 series represents the series of nucleotide sequences SEQ IDNOS: 633, 635, 640, 647, and 649, and the series of amino acid sequencesSEQ ID NOS: 634, 636-639, 641-646, 648, and 650.

The FIG. 39 Series:

The FIG. 39 series represents the series of nucleotide sequences SEQ IDNOS: 651, 653, 657, 660, and 662 and the series of amino acid sequencesSEQ ID NOS: 652, 654-656, 658-659, 661, and 663.

The FIG. 40 Series:

The FIG. 40 series represents the series of nucleotide sequences SEQ IDNOS: 664, 666, 669, 674, and 676, and the series of amino acid sequencesSEQ ID NOS: 665, 931-933, 667-668, 670-673, 675, and 677.

The FIG. 41 Series:

The FIG. 41 series represents the series of nucleotide sequences SEQ IDNOS: 678, 683, 686, 691, 693, 695, 697, 702, and 717 corresponding tothe insert of the vector p2D7 (deposited at the CNCM under the No.I-1821) and the series of amino acid sequences SEQ ID NOS: 679-682, 684,685, 687-690, 692, 694, 696, 698-701, 703-716, and 718-727.

The FIG. 42 Series:

The FIG. 42 series represents the series of nucleotide sequences SEQ IDNOS: 728, 733, 736, 739, and 741 and the series of amino acid sequencesSEQ ID NOS: 729-732, 734-735, 737-738, 740, and 742.

The FIG. 43 Series:

The FIG. 43 series represents the series of nucleotide sequences SEQ IDNOS: 743, 746, 752, 755, and 757 and the series of amino acid sequencesSEQ ID NOS: 744-745, 747-751, 753-754, 756, and 758.

The FIG. 44 Series:

The FIG. 44 series represents the series of nucleotide sequences SEQ IDNOS: 759, 761, 764, 767, and 769, and the series of amino acid sequencesSEQ ID NOS: 760, 762, 763, 765-766, 768, and 770.

The FIG. 45 Series:

The FIG. 45 series represents the series of nucleotide sequences SEQ IDNOS: 771, 784, 794, 805, 807, and 809 and the series of amino acidsequences SEQ ID NOS: 772-783, 785-793, 795-804, 806, 808, and 810.

The FIG. 46 Series:

The FIG. 46 series represents the series of nucleotide sequences SEQ IDNOS: 811, 813, 817, 821, and 823 and the series of amino acid sequencesSEQ ID NOS: 812, 814-816, 818-820, 822, and 824.

The FIG. 47 Series:

The FIG. 47 series represents the series of nucleotide sequences SEQ IDNOS: 825, 827, 831, 833, and 835 and the series of amino acid sequencesSEQ ID NOS: 826, 828-830, 832, 834, and 836.

The FIG. 48 Series:

The FIG. 48 series represents the series of nucleotide sequences SEQ IDNOS: 837, 839, 842, 844, and 846 and the series of amino acid sequencesSEQ ID NOS: 838, 840-841, 843, 845, and 847.

The FIG. 49 Series:

The FIG. 49 series represents the series of nucleotide sequences SEQ IDNOS: 848, 864, 878, 883, and 885 and the series of amino acid sequencesSEQ ID NOS: 849-863, 865-877, 879, 880-882, 884, and 886.

The FIG. 50 Series:

The FIG. 50 series represents the series of nucleotide sequences SEQ IDNOS: 887, 895, 901, 907, and 909 and the series of amino acid sequencesSEQ ID NOS: 888-894, 896-900, 902-906, 908, and 910.

FIG. 51:

A. Construct pJVED: shuttle plasmid (capable of multiplying inmycobacteria as well as in E. coli) with a kanamycin-resistance gene(derived from Tn903) as a selectable marker. The truncated phoA gene (AphoA) and the luc gene form a synthetic operon.

B. Joining sequence (SEQ ID NO: 922) between phoA and luc.

FIG. 52:

Genomic hybridization (Southern blotting) of the genomic DNA of variousmycobacterial species with the aid of an oligonucleotide probe whosesequence is the sequence between the nucleotide at position nt 964 (5′end of the probe) and the nucleotide at position nt 1234 (3′ end of theprobe), ends included, of the sequence SEQ ID NOS: 1, 8, 14, 25, 31, and33.

FIGS. 53 and 54:

Recombinant M. smegmatis Luc and PhoA activities containing pJVED withvarious nucleotide fragments as described in the examples. FIGS. 52 and53 represent the results obtained for two separate experiments carriedout under the same conditions.

FIG. 55:

Representation of the hydrophobicity (Kyte and Doolitle) of the codingsequence of the polypeptide DP428 with its schematic representation. TheLPISG (SEQ ID NO: 934) motif immediately precedes the hydrophobicC-terminal region. The sequence ends with two arginines.

FIG. 56:

Representation of the hydrophobicity (Kyte and Doolitle) of the sequenceof the polypeptide M1C25 having the amino acid sequence SEQ ID NO: 545.

FIG. 57:

A-Acrylamide gel (12%) under denaturing conditions of a bacterialextract obtained by sonication of E. coli M15 bacteria containing theplasmid pM1C25 without and after 4 hours of induction with IPTG, stainedwith Coomassie Blue.

Lane 1: Molecular weight marker (Prestained SDS-PAGE Standards HighRange BIO-RAD®).

Lane 2: Bacterial extract obtained by sonication of E. coliM15 bacteriacontaining the plasmid pM1C25 without induction with IPTG.

Lane 3: Bacterial extract obtained by sonication of E. coliM15 bacteriacontaining the plasmid pM1C25 after 4 hours of induction with IPTG.

Lane 4: Molecular weight marker (Prestained SDS-PAGE Standards Low RangeBIO-RAD®).

B—Western blotting of a similar gel (12% acrylamide) visualized by meansof the penta-His antibody marketed by the company Quiagen.

Lane 1: Representation of the molecular weight marker (PrestainedSDS-PAGE Standards High Range BIO-RAD®).

Lane 2: Bacterial extract obtained by sonication of E. coli M15 bacteriacontaining the plasmid pM1C25 without induction with IPTG.

Lane 3: Bacterial extract obtained by sonication of E. coli M15 bacteriacontaining the plasmid pM1C25 after 4 hours of induction with IPTG.

Lane 4: Representation of the molecular weight marker (PrestainedSDS-PAGE Standards Low Range BIO-RAD®).

The band which is most predominantly present in the lanes correspondingto the bacteria induced with IPTG compared with those not induced withIPTG, between 34,200 and 28,400 daltons, corresponds to the expressionof the insert M1C25 cloned into the vector pQE-60 (Qiagen®).

As regards the legend to the other figures which are numbered by analphanumeric character, each of these other figures represents thenucleotide sequence and the amino acid sequence having the SEQ IDsequence whose numbering is identical to the alphanumeric character ofeach of said figures.

The alphanumeric numberings of the figures representing the SEQ IDscomprising a number followed by a letter have the following meanings:

the alphanumeric numberings having the same number relate to the samefamily of sequences attached to the reference SEQ ID sequence whosenumbering has this same number and the letter A;

the letters A, B and C for the same family of sequences distinguish thethree possible reading frames of the reference SEQ ID nucleotidesequence (A);

the letters with a prime (′) index mean that the sequence corresponds toa fragment of the reference SEQ ID sequence (A);

the letter D means that the sequence corresponds to the sequence of thegene predicted by Cole et al., 1998;

the letter F means that the sequence corresponds to the open readingframe (ORF) containing the corresponding “D” sequence according to Coleet al., 1998;

the letter G means that the sequence is a sequence predicted by Cole etal., 1998, and exhibiting a homology of more than 70% with the referenceSEQ ID sequence (A);

the letter H means that the sequence corresponds to the open readingframe containing the corresponding “G” sequence according to Cole etal., 1998;

the letter R means that the sequence corresponds to a sequence predictedby Cole et al., 1998, upstream of the corresponding “D” sequence andcapable of being in phase with the sequence “D” because of possiblesequencing errors;

the letter P means that the sequence corresponds to the open readingphase containing the corresponding “R” sequence;

the letter Q means that the sequence corresponds to a sequencecontaining the corresponding “F” and “P” sequences.

As regards the sequence family SEQ ID NOS: 56-87, the preceding insertphoA contains two fragments which are noncontiguous on the genome, SEQID NO: 76 and SEQ ID NO: 56, and which are therefore derived from amultiple cloning allowing the expression and export of phoA. These twononcontiguous fragments, the genes and the open reading framescontaining them according to Cole et al., 1998, are important for theexport of an antigenic polypeptide:

the letters J, K and L distinguish the three possible reading frames ofthe corresponding nucleotide sequence “J”;

the letter M means that the sequence corresponds to the sequencepredicted by Cole et al., 1998, and containing the sequence SEQ ID NO:77;

the letter N means that the sequence corresponds to the open readingframe containing the sequence SEQ ID NO: 84.

As regards the sequence family SEQ ID NOS: 771-810, the letter Z meansthat the sequence corresponds to the sequence of a cloned fragment fusedwith phoA.

Finally, as regards the sequence family SEQ ID NOS: 678-727, the letterS means that the sequence corresponds to a sequence predicted by Cole etal., 1998 and which may be in the same reading frame as thecorresponding sequence “D”, the letter T meaning that the correspondingsequence contains the corresponding sequences “F” and “S”.

EXAMPLES

Materials and methods

Bacterial Cultures, Plasmids and Culture Media

E. coli was cultured on Luria-Bertani (LB) solid or liquid medium. M.smegmatis was cultured on Middlebrook 7H9 liquid medium (Difco)supplemented with albumin-dextrose (ADC), 0.2% glycerol and 0.05% Tween,or on solid L medium. If necessary, the antibiotic kanamycin was addedat a concentration of 20 μg/ml. The bacterial clones having a PhoAactivity were detected on LB agar containing 5-bromo-4-chloro-3-indolylphosphate (X-P, at 40 μg/ml).

Manipulation of DNA and Sequencing

The manipulations of DNA and the Southern-blot analyses were carried outusing the standard techniques (Sambrook et al., 1989). Thedouble-stranded DNA sequences were determined with a Taq Dye DeoxyTerminator Cycle sequencing kit (Applied Biosystems), in a System 9600GeneAmp PCR (Perkin-Elmer), and after migration on a model 373 DNAanalyzing system (Applied Biosystems).

Constructions of the Plasmids

The plasmid pJVEDa was constructed from pLA71, a transfer plasmidcomprising the phoA gene which is truncated and placed in phase withBlaF. pLA71 was cleaved with the restriction enzymes KpnI and NotI, thusremoving phoA without affecting the promoter of BlaF The luc geneencoding the firefly luciferase was amplified from pGEM-/luc and aribosome-binding site was added. phoA was amplified from pJEM11. Theamplified fragments were cleaved with PstI and ligated together. Theoligodeoxynucleotides used are the following: (SEQ ID NO: 911)pPV.Iuc.Fw: 5′GACTGCTGCAGAAGGAGAAGATCCAAATGG3′ (SEQ ID NO: 912) Iuc.Bw:5′GACTAGCGGCCGCGAATTCGTCGACCTCCGAGG3′ (SEQ ID NO: 913) pJEM.phoA.Fw:5′CCGCGGATCCGGATACGTAC3′ (SEQ ID NO: 914) phoA.Bw:5′GACTGCTGCAGTTTATTTCAGCCCCAGAGCG3′.

The fragment thus obtained was reamplified using the oligonucleotidescomplementary to its ends, cleaved with KpnI and NotI, and integratedinto pLA71 cleaved with the same enzymes. The resulting construct waselectroporated into E. coli DH5α and M. smegmatis mc² 155. An M.smegmatis clone emitting light and having a phoA activity was selectedand called pJVED/blaF The insert was removed using BamHI and theconstruct closed again on itself, thus reconstructing pJVED_(a). Toobtain pJVED_(b,c), the multiple cloning site was cleaved with ScaI andKpnI and closed again, removing one (pJVED_(b)) or two (pJVED_(c))nucleotides from the SnaBI site. After fusion, it was thus possible toobtain six reading frames. The insert of pJVED/hsp18 was obtained bypolymerase chain amplification (PCR) of pPM1745 (Servant et al., 1995)using oligonucleotides having the sequence: 18.Fw:5′GTACCAGTACTGATCACCCGTCTCCCGCAC3′ (SEQ ID NO; 915) 18.Back:AGTCAGGTACCTCGCGGAAGGGGTCAGTGCG3′ (SEQ ID NO: 916)

The product was cleaved with KpnI and ScaI, and ligated to pJVED_(a),cleaved with the same enzymes, thus leaving pJVED/hsp18.

pJVED/P19 kDa and pJVED/erp were constructed by cleaving with BamHI theinsert of pExp410 and pExp53, respectively, and inserting them into theBamHI site of the multiple cloning site of pJVED_(a).

Measurement of the Alkaline Phosphatase Activity

The presence of activity is detected by the blue color of the coloniesgrowing on a culture medium containing the substrate5-bromo-4-chloro-3-indolyl phosphate (XP), and then the activity can bequantitatively measured more precisely in the following manner:

M. smegmatis was cultured in an LB medium supplemented with 0.05% Tween80 (Aldrich) and kanamycin (20 μg/ml) at 37° C. for 24 hours. Thealkaline phosphatase activity was measured by the Brockman and Heppelmethod (Brockman et al., 1968) in a sonicated extract, withp-nitrophenyl phosphate as reaction substrate. The quantity of proteinswas measured by the Bio-Rad assay. The alkaline phosphatase activity isexpressed as arbitrary units (optic density at 420 nm×μg ofprotein⁻¹×minutes⁻¹).

Measurement of the Luciferase Activity

M. smegmatis was cultured in an LB medium supplemented with 0.05% Tween80 (Aldrich) and kanamycin (20 μg/ml) at 37° C. for 24 hours and used infull exponential growth (OD at 600 nm between 0.3 and 0.8). The aliquotsof bacterial suspensions were briefly sonicated and the cell extract wasused to measure the luciferase activity. 25 μl of the sonicated extractwere mixed with 100 μl of substrate (Promega luciferase assay system)automatically in a luminometer and the emitted light expressed in RLU(Relative Light Units). The bacteria were counted by serial dilutions ofthe origin suspension on LB-kanamycin agar medium and the luciferaseactivity expressed in RLU/μg of bacterial protein or in RLU/10³bacteria.

Construction of M. tuberculosis and M. bovis-BCG Genomic Libraries

The libraries were obtained essentially using pJVED_(a,b,c), which aredescribed above.

Preparation of macrophages derived from bone marrow and infection withrecombinant M. smegmatis.

The macrophages derived from bone marrow were prepared as described byLang et al., 1991. In summary, the bone marrow cells were removed fromthe femur of 6- to 12-week old C57BL/6 mice (Iffa-Credo, France). Thecells in suspensions were washed and resuspended in DMEM enriched with10% fetal calf serum, 10% of conditioned L-cell medium and 2 mMglutamine, without antibiotics. 10⁶ cells were inoculated onflat-bottomed 24-well Costar plates in 1 ml. After four days at 37° C.in a humid atmosphere containing a CO₂ content of 10%, the macrophageswere rinsed and reincubated for an additional two to four days. Thecells of a control well were lysed with triton×100 at 0.1% in water andthe nuclei enumerated. About 5×10⁵ adherent cells were counted. For theinfection, M. smegmatis carrying the different plasmids was cultured infull exponential phase (OD_(600 nm) between 0.4 and 0.8) and diluted toan OD of 0.1 and then 10-fold in a medium for macrophage. 1 ml was addedto each well and the plates were centrifuged and incubated for fourhours at 37° C. After three washes, the cells were incubated in a mediumcontaining amikacin for two hours. After three new washes, the adherentinfected cells were incubated in a macrophage medium overnight. Thecells were then lysed in 0.5 ml of lysis buffer (Promega). 100 μl weresonicated and the light emitted was measured on 25 μm. Simultaneously,the bacteria were enumerated by spreading on L-agar-kanamycin (20μg/ml). The light emitted is expressed in RLU/10³ bacteria.

Analyses of the Databanks

The nucleotide sequences were compared with EMBL and GenBank using theFASTA algorithm and the protein sequences were analyzed by similitude bymeans of the PIR and Swiss Prot databanks using the BLAST algorithm.

Example 1 The pJVED Vectors

The pJVED vectors (FIG. 51) are plasmids carrying an E. coli truncatedphoA gene without initiation codon, signal sequence and regulatorysequence. The multiple cloning site (MCS) allows the insertion offragments of the genes encoding potential exported proteins as well astheir regulatory sequences. Consequently, the fusion protein may beproduced and may exhibit an alkaline phosphatase activity if it isexported. Only the fusions in phase may be produced. Thus, the MCS wasmodified so that the fusions may be obtained in six reading frames. Thefirefly luciferase luc gene was inserted downstream of phoA. Thecomplete gene with the initiation codon, but without any promoter havingbeen used, thus ought to be expressed with phoA as in a syntheticoperon. A new ribosome-binding site was inserted eight nucleotidesupstream of the luc initiation codon. Two transcriptional terminatorsare present in the pJVED vectors, one upstream of the MCS and a seconddownstream of luc. These vectors are E. coli-mycobacterium transferplasmids with a kanamycin-resistance gene as selectable marker.

phoA and luc function as in an operon, but export is necessary for thephoA activity.

Four plasmids were constructed by insertion into the MCS of DNAfragments of diverse origin:

In the first construct called pJVED/blaF, the 1.4 kb fragment is derivedfrom the plasmid already described pLA71 (Lim et al., 1995). Thisfragment, derived from the β-lactamase gene (blaF) of M. fortuitum D216(Timm et al., 1994), includes the hyperactive mutated promoter, thesegment encoding 32 amino acids of the signal sequence and the first 5amino acids of the mature protein. Thus, this construct includes thestrongest promoter known in mycobacterium and the elements necessary forthe export of the phoA fusion protein. Consequently, a strong lightemission and a good phoA activity can be expected from this construct(cf. FIGS. 53 and 54).

Into a second construct called pJVED/hsp18, a 1.5 kb fragment was clonedfrom the plasmid already described pPM1745 (Servant et al., 1995). Thisfragment includes the nucleotides encoding the first ten amino acids ofthe 18 kb heat shock protein derived from Streptomyces albus (heat shockprotein 18, HSP 18), the ribosome-binding site, the promoter and,upstream, regulatory sites controlling its expression. This proteinbelongs to the alpha-crystalline family of low-molecular weight HSP(Verbon et al., 1992). Its homolog, derived from M. leprae, the 18 kDaantigen, is already known to be induced during phagocytosis by a murinemacrophage of the J-774 cell line (Dellagostinet et al., 1995). Understandard culture conditions, pJVED/hsp18 shows a weak luc activity andno phoA activity (cf. FIGS. 53 and 54).

In a third construct, called pJVED/P19 kDa, the insert derived frompExp410 (Lim et al., 1995) was cleaved and cloned into the MCS ofpJVED_(a). This fragment includes the nucleotides encoding the first 134amino acids of the M. tuberculosis 19 kDa known protein and of itsregulatory sequences. As has been demonstrated, this protein is aglycosylated lipoprotein (Garbe et al., 1993; Herrmann et al., 1996). InFIGS. 53 and 54, a good luc activity corresponding to a strong promoteris observed for this construct, but the phoA activity is the strongestof the four constructs. The high phoA activity of this fusion proteinwith a lipoprotein is explained by the fact that it remains attached tothe cell wall by its N-terminal end.

In the fourth and last construct, called pJVED/erp, the insert isderived from pExp53 (Lim et al., 1995) and was cloned into the MCS ofpJVED_(a). pExp53 is the initial plasmid selected for its phoA activityand containing a portion of the M. tuberculosis erp gene which encodes a28-kDa antigen. The latter includes the signal sequence, a portion ofthe mature protein and, upstream of the initiation codon, theribosome-binding site. The promoter was mapped. A putative iron box ofthe fur type is present in this region and flanks the −35 region of thepromoter (Berthet et al., 1995). As expected (FIGS. 53 and 54) thisconstruct exhibits a good light emission and a good phoA activity. Thefact that this fusion protein, unlike the fusion with the lipoprotein of19 kDa, does not appear to be attached to the cell wall does not excludethat the native protein is combined with it. Furthermore, the C-terminalend of erp is absent from the fusion protein.

Example 2

Construction of an M. tuberculosis genomic DNA library in the pJVED_(s)vectors and identification of one of the members of these libraries,(DP428), induced during phagocytosis by murine macrophages derived frombone marrow.

The various constructs were tested for their capacity to evaluate theintracellular expression of the genes identified by the expression ofphoA. For this purpose, the luc activity is expressed in RLU for 10³bacteria in axenic culture and/or under intracellular conditions. Theinduction or the repression following phagocytosis by the bonemarrow-derived murine macrophages can be suitably evaluated by themeasurement of specific activities. The results of two separateexperiments are presented in Table 2.

The plasmid pJVED/hsp18 was used as positive control for the inductionduring the intracellular growth phase. Although the induction of thepromoter by heating the bacterium at 42° C. was not conclusive, thephagocytosis of the bacterium clearly leads to an increase in theactivity of the promoter. In all the experiments, the intracellular lucactivity was strongly induced, increasing by to 100-fold the initiallyweak basal activity (Servant, 1995).

The plasmid pJVED/blaF was used as a control for nonspecific modulationduring the phagocytosis. It was possible to detect weak variations whichwere probably due to changes in culture conditions. Whatever the case,these weak variations are not comparable to the induction observed withthe plasmid pJVED/hsp18.

All the members of the DNA library were tested by measuring the activityof the promoter during the intracellular growth. Among these, DP428 isstrongly induced during phagocytosis (Tables 1 and 2). TABLE 1 RLU/10³RLU/10³ extracellular intracellular Construct % Recovery bacteriabacteria Induction pJVED/blaF* 0.5 1460 1727 1.2 pJVED/hsp18 0.6 8 577.1 pJVED/DP428 0.7 0.06 18 300 RLU/10³ RLU/10³ intracellular % Recoveryextracellular bacteria Induction Construct C57BL/6 Balb/C bacteriaC57BL/6 Balb/C C57BL/6 Balb/C pJVED/blaF* 7 1.1 662 250 911 0.4 1.4pJVED/hsp18 6.7 1.7 164 261 325 1.6 2 pJVED/DP428 1.6 2.1 0.08 1.25 3.315.6 41

TABLE 2 RLU/10³ RLU/10³ extracellular intracellular Construct % Recoverybacteria bacteria Induction pJVED/blaF* 22 1477 367 0.25 pJVED/hsp18 70.26 6.8 26 PJVED/DP428 21 0.14 4 28

The nucleotide fragment encoding the N-terminal region of thepolypeptide DP428 having the sequence SEQ ID NO: 543 is contained in theplasmid deposited at the CNCM under the No. I-1818.

The entire sequence encoding the polypeptide DP428 was obtained asdetailed below.

A probe was obtained by PCR with the aid of oligonucleotides having thesequence SEQ ID NO: 528 and SEQ ID NO: 529. This probe was labeled byrandom extension in the presence of [³²P]dCTP. Hybridization of thegenomic DNA of M. tuberculosis strain Mt 103 previously digested withthe endonuclease ScaI was carried out with the aid of said probe. Theresults of the hybridization revealed that a DNA fragment of about 1.7kb was labeled. Because an Sca1 site exists, extending from thenucleotide nt 984 to the nucleotide nt 989 of the sequence SEQ ID NO: 1,that is to say on the 5′ side of the sequence used as probe, the end ofthe coding sequence is necessarily present in the fragment detected byhybridization.

The genomic DNA of the M. tuberculosis Mt 103 strain, after digestionwith ScaI, was subjected to migration on agarose gel. The fragments ofbetween 1.6 and 1.8 kb in size were cloned into the vector pSL1180(Pharmacia) previously cleaved with ScaI and dephosphorylated. Aftertransformation of E. coli with the resulting recombinant vectors, thecolonies obtained were screened with the aid of the probe. The screeningmade it possible to isolate six colonies hybridizing with this probe.

The inserts contained in the plasmids of the previously selectedrecombinant clones were sequenced and then the sequences aligned so asto determine the entire sequence encoding DP428, more specifically SEQID NO: 35.

A pair of primers were synthesized in order to amplify, starting withthe genomic DNA of M. tuberculosis, strain Mt 103, the entire sequenceencoding the polypeptide DP428. The amplicon obtained was cloned into anexpression vector.

Pairs of primers appropriate for the amplification and the cloning ofthe sequence encoding the polypeptide DP428 can be easily produced bypersons skilled in the art, on the basis of the nucleotide sequences SEQID NO: 1 and SEQ ID NO: 35.

A specific pair of primers according to the invention is the followingpair of primers, which is capable of amplifying the DNA encoding thepolypeptide DP428 lacking its signal sequence:

forward primer (SEQ ID NO: 917), comprising the sequence going from thenucleotide at position nt 531 to the nucleotide nt 554 of the sequenceSEQ ID NO: 35: 5′-AGTGCATGCTGCTGGCCGAACCATCAGCGAC-3′

backward primer (SEQ ID NO: 918), comprising the sequence complementaryto the forward sequence of the nucleotide at position nt 855 to thenucleotide at position nt 835 of the sequence SEQ ID NO: 35:5′-CAGCCAGATCTGCGGGCGCCCTGCACGGCCTG-3′,

in which the portion underlined represents the sequences hybridizingspecifically with the sequence SEQ ID NO: 35 and the 5′ ends correspondto restriction sites for the cloning of the resulting amplicon into acloning and/or expression vector.

A specific vector used for the expression of the polypeptide DP428 isthe vector pQE70 marketed by the company Qiagen.

Example 3

The complete sequence of the DP428 gene and its flanking regions.

A probe of the coding region of DP428 was obtained by PCR and used tohybridize the genomic DNA of various mycobacterial species. According tothe results of FIG. 3, the gene is present only in mycobacteria of theM. tuberculosis complex.

Analysis of the sequence suggests that DP428 could be part of an operon.The coding sequence and the flanking regions exhibit no homology withknown sequences deposited in databanks.

Based on the coding sequence, the gene encodes a 10 kDa protein with asignal peptide, a hydrophobic C-terminal end which ends with twoarginines and is preceded by an LPISG (SEQ ID NO: 934) motif similar tothe known LPXTG (SEQ ID NO: 935) motif. These two arginines couldcorrespond to a retention signal and the protein DP428 could be attachedvia this motif to peptidoglycans as has already been described in otherGram⁺ bacteria (Navarre et al., 1994 and 1996).

The mechanism for survival and intracellular growth of mycobacteria iscomplex and the intimate relationships between the bacteria and the hostcell remain unexplained. Whatever the mechanism, the growth and theintracellular survival of mycobacteria depend on factors produced by thebacteria produced by the bacterium and capable of modulating theresponse of the host. These factors may be molecules which are exposedat the cell surface, such as LAM or cell surface-associated proteins, oractively secreted molecules.

On the other hand, intracellularly, the bacteria themselves have toconfront a hostile environment. They appear to respond to this by meanssimilar to those used under stress conditions, by inducing heat shockproteins (Dellagostin et al., 1995), but also by the induction or therepression of various proteins (Lee et al., 1995). Using a methodologyderived from PCR, Plum and Clark-curtiss (Plum et al., 1994) have shownthat an M. avium gene included in a 3 kb DNA fragment is induced afterphagocytosis by human macrophages. This gene encodes an exported proteincomprising a leader sequence but exhibiting no significant homology withthe sequences proposed by databanks. The induction, during theintracellular growth phase, of a low-molecular-weight heat shock proteinderived from M. leprae has also been demonstrated (Dellagostin et al.,1995). In another study, the bacterial proteins from M. tuberculosiswere metabolically labeled during the intracellular growth phase orunder stress conditions and separated by two-dimensional gelelectrophoresis: 16 M. tuberculosis proteins were induced and 28 wererepressed. The same proteins are involved during stress caused by a lowpH, a heat shock, H₂O₂, or during phagocytosis by human monocytes of theTHP1 line. Whatever the case, the behavior of the induced and repressedproteins was unique under each condition (Lee et al., 1995). Takentogether, these results indicate that a subtle molecular dialogue isinstalled between the bacteria and their host cells. This dialogueprobably depends on the fate of the intracellular organism.

In this context, the induction of the expression of DP428 could be ofmajor importance, indicating an important role for this protein inintracellular survival and growth.

The method used in these experiments to evaluate the intracellularexpression of the genes (cf. Jacobs et al., 1993, for the method fordetermining the expression of firefly luciferase, and Lim et al., 1995,for the method for determining the expression of the PhoA gene) has theadvantage of being simple compared with the other techniques such as thetechnique described by Mahan et al. (Mahan et al., 1993) adapted tomycobacteria and proposed by Bange et al., (Bange et al., 1996) or thesubtractive method based on PCR described by Plum and Clark-Curtiss(Plum et al., 1994). Variability undoubtedly exists as shown bycomparing the various experiments. Although causing the induction or therepression is sufficient, it is now possible to evaluate it, thusproviding an additional tool for the physiological studies of theexported proteins identified by fusion with phoA.

Example 4

Search for modulation of the activity of the promoters during theintramacrophage phases.

Mouse bone marrow macrophages are prepared as described by Lang andAntoine (Lang et al., 1991). Recombinant M. smegmatis bacteria, whoseluciferase activity per 10³ bacteria has been determined as above, areincubated at 37° C. under a humid atmosphere enriched with 5% CO₂, for 4hours in the presence of these macrophages such that they arephagocytosed. After rinsing in order to remove the remainingextracellular bacteria, amikacin (100 μg/ml) is added to the culturemedium for two hours. After another rinsing, the medium is replaced withan antibiotic-free culture medium (DMEM enriched with 10% calf serum and2 mM glutamine). After overnight incubation as above, the macrophagesare lysed at low temperature (4° C.) with the aid of a lysis buffer (ceelysis buffer, Promega), and the luciferase activity per 10³ bacteria isdetermined. The ratio of the activities at placing in culture and afterone night gives the coefficient of induction.

Example 5

Isolation of a series of sequences by sequencing directly usingcolonies.

A series of sequences allowing the expression and export of phoA wereisolated from the DNA of M. tuberculosis or of M. bovis BCG. Among thisgroup of sequences, two of them were further studied, the entire genescorresponding to the inserts were cloned, sequenced and antibodiesagainst the product of these genes served to show by electron microscopythat these genes encoded antigens found at the surface of the tuberclebacilli. One of these genes, erp, encoding a consensus export signalsequence, the other, des, possessed no characteristic of a gene encodingan exported protein, based on the sequence. Another gene, DP428, wassequenced before the sequence of the M. tuberculosis genome becameavailable. It contains a sequence resembling the consensus sequence forattachment to peptidoglycan, which suggests that it is also an antigenwhich is probably found at the surface of the tubercle bacilli. Thestudy of the three genes, erp, des, and that encoding DP428, shows thatthe phoA system which we have developed in mycobacteria makes itpossible to pick out genes encoding exported proteins with nodeterminant which can be picked out by studies in silico. This isparticularly true for the polypeptides which do not possess a consensussignal sequence (des) or no similarity with proteins having a knownfunction (erp and DP428).

A number of inserts were identified and sequenced before knowing thegenome of M. tuberculosis or of others below. These sequences may beconsidered as primers which make it possible to search for genesencoding exported proteins. To date, a series of primers have beensequenced and the entire corresponding genes have been either sequencedor identified based on the published sequence of the genome. To takeinto account sequencing errors which are always possible, the regionsupstream or downstream of some primers were considered as being capableof forming part of sequences encoding exported proteins. In some cases,similarities with genes encoding exported proteins or sequencescharacteristic of export signals or topological characteristics ofmembrane proteins were detected.

Primer sequences are found to correspond to genes belonging to familiesof genes possessing more than 50% similarity. It is thus possible toindicate that the other genes detected by similarity with a primerencode exported proteins. This is the case for the sequence SEQ ID NO:154 and SEQ ID NO: 156 which possess more than 77% similarity with SEQID NOS: 137& 143.

The sequences which may encode exported proteins are the following: SEQID NOS: 1, 8, 14, 25, 31, 33, 137, 139, 141, 143, 145, 148, 150, 152,154, 156, 158, 160, 162, 225, 228, 238, 246, 250, 255, 258, 260, 41, 46,52, 165, 169, 177, 407, 410, 412, 419, 421, 426, 429, 431, 433, 437,441, 447, 452, 456, 459, 461, 110, 113, 119, 353, 357, 359, 489, 495,497, 501, 505, 510, 516, 519, 522, 651, 653, 657, 660, 662, 759, 761,764, 767, 769, 811, 813, 817, 821, 823, 887, 895, 901, 907, and 909.

Genes identified based on the primers from the sequence of the genomehave no characteristic (based on the sequence) of the exported proteins.They are the following sequences: SEQ ID NOS: 57-61, 63, 65-66, 68,70-71, 73, 75, 77, 79-80, 82-83, 85, 87, 531-533, 535-536, 538-542,185-188, 190-194, 196-199, 201, 203-205, 207-208, 210, 212, 214-216,218-219, 221-224, 263-267, 269-271, 275-277, 279, 281, 283, 285, 287,289, 291-296, 298-309, 311-316, 129-132, 134-136, 318-320, 322, 324,326, 328-330, 332, 334, 336, 338, 340-345, 348-352, 362-363, 365-367,369, 370, 372-373, 375-379, 381-382, 384, 386, 388, 390-392, 394, 396,398, 400-402, 404, 406, 464-468, 470-471, 473, 475, 477-481, 483-484,486, 488, 547-549, 551, 553, 555, 557, 559-563, 565-568, 570, 572,574-575, 577-579, 581-583, 585, 587, 589, 591-593, 595, 597, 599,601-603, 605-607, 609, 611, 613, 615, 617, 619, 621, 623, 625, 627-628,630, 632, 634, 636-639, 641-646, 648, 650, 665, 931-933, 667-668,670-673, 675, 677, 679-682, 684-685, 687-690, 692, 694, 696, 698-701,703-716, 718-727, 729-732, 734-735, 737, 738, 740, 742, 744-745,747-751, 753-754, 756, 758, 772-780, 781-783, 785-793, 795-804, 806,808, 810, 826, 828-830, 832, 834, 836, 838, 840-841, 843, 845, 847,849-863, 865-877, 879-882, 884, and 886.

Based on the sequence of other organisms such as E. coli, it is possibleto search in the sequence of the M. tuberculosis genome for genespossessing similarities with proteins known to be exported in otherorganisms although not possessing an export signal sequence. In thiscase, fusion with phoA is an advantageous protocol for determining ifthese M. tuberculosis sequences encode exported proteins althoughpossessing no consensus signal sequence. It has indeed been possible toclone SEQ ID NOS: 848, 864, 878, 883, and 885, sequences similar to anE. coli gene of the htrA family. A fusion of SEQ ID NOS: 848, 864, 878,883, and 885 with phoA leads to the expression and the export of phoA.M. smegmatis colonies harboring SEQ ID NOS: 848, 864, 878, 883, and 885phoA fusion on a plasmid pJVED are blue.

SEQ ID NOS: 849-863, 865-877, 879, 880-882, 884, and 886 are thereforeconsidered exported proteins.

The phoA method is therefore useful for detecting, based on the M.tuberculosis sequence, genes encoding exported proteins without themencoding sequences which are characteristic of the exported proteins.

Even if a sequence possesses determinants of exported proteins, thisdoes not demonstrate a functional export. The phoA system makes itpossible to show that the gene suspected really encodes an exportedprotein. Thus, it was checked that the sequences SEQ ID NOS: 887, 895,901, 907, and 909 indeed possessed export signals. TABLE 3 Reference ofthe corresponding sequence SEQ ID No. predicted by Cole et at.Annotation SEQ ID NOS: 2-7, Rv 0203 * Sequence hydrophobic at 9-13,15-24, 26-30, the N-terminus 32, 34 SEQ ID NOS: 57-61, Rv 2050 Noprediction 63, 65-66, 68, 70-71, 73, 75, 77, 79-80, 82-83, 85, 87,531-533, 535-536, 538-542 SEQ ID NOS: Rv 2563 * Membrane protein 138,272-273, 140, 142, 144, 146-147, 149, 151, 153, 155, 157, 159, 161,163-164 SEQ ID NOS: Rv 0072 * Possible transmembrane 155, 157 transportprotein of the ABC type SEQ ID NOS: Rv 0546c ML Protein S-D Lactoyl185-188, 190-194, Glutathione-methyl 196-199, glyoxal lyase 201,203-205, 207-208, 210, 212 SEQ ID NOS: no prediction not found in214-216, 218, M. tuberculosis H37rv 219, 221-224 SEQ ID NOS: Rv 1984c *probable precursor 226-227, 923-925, cutinase with an N- 229-237,terminal signal sequence 239-245, 247-249, 251-254, 256-257, 259, 261,42-45, 47-51, 53-55, 166-168, 170-176, 178-183 SEQ ID NOS: no predictionno prediction 263-267, 269-271, 275-277, 279, 281, 283, 285, 287, 289,291-296, 298-309, 311-316, 123-127, 129-132, 134-136 SEQ ID NOS: withreading frame shift, no prediction 318-320, 322, could be in phase with324, 326, 328-330, Rv 2530c 332, 334, 336, 338, 340-345, 348-352 SEQ IDNOS: Rv 1303 ML no prediction 362-363, 365-367, 369-370, 372-373,375-379, 381-382, 384, 386 SEQ ID NOS: Rv 0199 ML no prediction 388,390-392, 394, 396, 398, 400-402, 404, 406 SEQ ID NOS: Rv 0418 * site forattachment of 408-409, 411, prokaryotic membrane 413-418, 420,lipoprotein, similarity with 422-425, 427-428, N-acetyl puromycin acetyl430, 432 hydrolase SEQ ID NOS: Rv 3576 * contains a site for 434-436,438-440, attachment of prokaryotic 442-446, membrane lipoprotein,448-451, 453-455, similarity with a 457-458, serine/threonine protein460, 462, 111-112, kinase 114-118, 120-121 SEQ ID NOS: Rv 3365c MLsimilarity with a zinc 464-468, 470-471, metallopeptidase 473, 475,477-481, 483-484, 486, 488 SEQ ID NOS: not predicted no prediction547-549, 551, 553, 555 SEQ ID NOS: Rv 0822c ML Existence of a consensus557, 559-563, region with the drac family 565-568, 570, 572 SEQ ID NOS:Rv 1044 no prediction 574-575, 577-579, 581-583, 585, 587 SEQ ID NOS:not predicted no prediction 589, 591-593, 595, 597 SEQ ID NOS: Rv 2169cno prediction 599, 601-603, 605-607, 609, 611 SEQ ID NOS: Rv 3909 ML noprediction 613, 615, 617, 619, 621 SEQ ID NOS: Rv 2753c similarity with623, 625, 627-628, dihydropricolinate 630, 632 synthases SEQ ID NOS: Rv0175 no prediction 634, 636-639, 641-646, 648, 650, SEQ ID NOS: Rv3006 * prediction of lipoprotein 652, 654-656, ML signal sequence658-659, 661, 663 SEQ ID NOS: Rv 0549c no prediction 665, 931-933,667-668, 670-673, 675, 677 SEQ ID NOS: Rv 2975c being capable ofsimilarity with substilis 679-682, 684-685, being in phase with protein687-690, Rv 2974c 692, 694, 696, 698-701, 703-716, 718-727 SEQ ID NOS:Rv 2622 similarity with a methyl 729-732, 734-735, transferase 737-738,740, 742 SEQ ID NOS: Rv 3278c ML no prediction 744-745, 747-751,753-754, 756, 758 SEQ ID NOS: Rv 0309 * no prediction 760, 762-763,765-766, 768, 770 SEQ ID NOS: Rv 2169c ML no prediction 772-783,785-793, 795-804, 806, 808, 810 SEQ ID NOS: Rv 1411c * probablelipoprotein with 812, 814-816, an N-terminal signal 818-820, 822,sequence 824 SEQ ID NOS: Rv 1714 similarity with a gluconate 826,828-830, 3-dehydrogenase 832, 834, 836 SEQ ID NOS: Rv 0331 similaritywith a sulfide 838, 840-841, dehydrogenase and a 843, 845, 847, sulfidequinone reductase SEQ ID NOS: Rv 0983 ML similarity with a serine849-863, 865-877, protease HtrA 877, 879-882, 884, 886 SEQ ID NOS: 89,91, 93-95, 97, 99, 101-103, 105, 107, 109 SEQ ID NOS: Rv 3810 * Surfaceprotein (Berthelet 354-356, 358, ML et al., 1995) 360, 926-930 SEQ IDNOS: Rv 3763 * Contains a site for 490-494, 496, attachment ofeukaryotic 498-500, 502-504, membrane lipoprotein 506-509, 511-515,517-518, 520-521, 523-527 SEQ ID NOS: Rv 0125 * Active site of serine888-894, 896-900, proteases 902-906, Possible N-terminal signal 908, 910sequenceLegend to Table 3:Correspondence between the sequences according to the invention and thesequences predicted by Cole et al., 1998, Nature, 393, 537-544.*: Prediction that the protein encoded by the sequence is exported.ML: Prediction of similarity with M. leprae.

Example 6

Characteristics and production of the protein M1C25.

The N-terminal end of the protein M1C25 was detected by the PhoA systemas allowing the export of the fusion protein, necessary for theproduction of its phosphatase activity.

The DNA sequence encoding the N-terminal end of the protein M1C25 iscontained in the sequences SEQ ID NOS: 433, 437, 441, 447, 452, 456,459, 461 of the present patent application.

From this primer sequence, the complete gene encoding the protein M1C25was sought in the M. tuberculosis genome (Wellcome Trust Foundation,Sanger site).

The Sanger center attributed to M1C25 the names:

Rv 3576,

MTCY06G11.23,

pknM

Sequence SEQ ID NO: 544 of the Complete M1C25 Gene (714 Bases): cf. FIG.29

This gene encodes a protein of 237 AA, having a molecular mass of 25kDa. This protein is listed in the libraries under the names:

PID:e306716,

SPTREMBL:P96858

Sequence SEQ ID NO: 545 of the Protein M1C25 (235 Amino Acids): cf. FIG.30

M1C25 contains a site for attachment to the lipid portion of theprokaryotic membrane lipoproteins (PS00013 Prokaryotic membranelipoprotein lipid attachment site: (SEQ ID NO: 919) CTGGTCGGTGCGTGCATGCT CGGAGCCGGA TGC).

The function of M1C25 is not clear but it most probably possesses a“serine/threonine protein kinase” activity. Similarities should be notedwith the C-terminal moiety of K08G_MYCTU Q11053 Rv1266c (MTCY50.16).Similarities are also found with KY28_MYCTU.

A gene potentially encoding a regulatory protein (PID:e306715,SPTREMBL:P96857, Rv3575c, (MTCY06G11.22c)) is found in 5′ of the geneencoding M1C25.

The hydrophobicity profile (Kyte and Doolitle) of M1C25 is representedin FIG. 56.

A site of cleavage of the signal sequence is predicted (SignalP V1.1;World Wide Web Prediction Server, Center for Biological SequenceAnalysis) between amino acids 31 and 32: AVA-AD. This cleavage site isbehind a conventional “AXA” motif. This prediction is compatible withthe hydrophobicity profile. In this potential signal sequence, it isobserved that the sequence of the three amino acids LAA is repeatedthree times.

Cloning of the M1C25 gene for the production of the protein which itencodes:

A pair of primers were synthesized in order to amplify, using thegenomic DNA of M. tuberculosis, strain H37Rv, the entire sequenceencoding the polypeptide M1C25. The amplicon obtained was cloned into anexpression vector.

Pairs of primers appropriate for the amplification and the cloning ofthe sequence encoding M1C25 were synthesized: forward primer: (SEQ IDNO: 920) 5′-ATAATACCATGGGCAAGCAGCTAGCCGCGC-3′ backward primer: (SEQ IDNO: 921) 5′-ATTTATAGATCTCTGCTTAGCAACCTTGGCCGCG-3′

The underlined portion represents the sequences specifically hybridizingwith the M1C25 sequence and the 5′ ends correspond to restriction sitesfor the cloning of the resulting amplicon into a cloning and/orexpression vector.

A specific vector used for the expression of the polypeptide M1C25 isthe vector pQE60 marketed by the company Qiagen, following the protocoland the recommendations proposed by this brand.

The cells used for the cloning are bacteria: E. coli XL1-Blue (resistantto tetracycline).

The cells used for the expression are bacteria: E. coli M15 (resistantto kanamycin) containing the plasmid pRep4 (M15 pRep4).

The production of the protein MYC25 is illustrated by FIGS. 57A and B(bacterial extracts from the E. coli M15 strain containing the plasmidpM1C25). The bacterial cultures and the extracts are prepared accordingto Sambrook et al. (1989). Analysis of the bacterial extracts is carriedout according to the Quiagen instructions (1997).

BIBLIOGRAPHIC REFERENCES

-   AIDS therapies, 1993, in Mycobacterial infections, ISBN    0-9631698-1-5, pp. 1-11.-   Altschul, S. F. et al., 1990, J. Mol. Biol., 215: 403-410.-   Andersen, P. et al., 1991, Infect. Immun., 59: 1905-1910.-   Andersen, P. et al., 1995, J. Immunol., 154: 3359-3372.-   Bange, F. C. et al., A. M. Brown, and W. R. Jacobs J R., 1996,    Leucine auxotrophy restricts growth of Mycobacterium bovis BCG in    macrophages. Infect. Immun., 64:1794-1799.-   Barany, F., 1911, Proc. Natl. Acad. Sci. USA, 88: 189-193.-   Bates, J. 1979, Chest. 76(Suppl.):757-763.-   Bates, J. et al., 1986. Am. Rev. Respir. Dis. 134: 415-417.-   Berthet, F. X., J. Rauzier, E. M. Lim, W. Philipp, B. Gicquel,    and D. Portnoï, 1995, Characterization of the M. tuberculosis erp    gene encoding a potential cell surface protein with repetitive    structures. Microbiology. In press.-   Borremans, M. et al., 1989, Biochemistry, 7: 3123-3130.-   Bouvet, E. 1994, Rev. Fr. Lab. 273: 53-56.-   Brockman, R. W. and Heppel L. A., 1968, On the localization of    alkaline phosphatase and cyclic phosphodiesterase in Escherichia    coli, Biochemistry, 7: 2554-2561.-   Burg, J. L. et al., 1996, Mol. and Cell. Probes, 10: 257-271.-   Chevrier, D. et al., 1993, Mol. and Cell. Probes, 7: 187-197.-   Chu, B. C. F. et al., 1986, Nucleic Acids Res., 14: 5591-5603.-   Clemens, D. L., 1996, Characterization of the Mycobacterium    tuberculosis phagosome, Trends Microbiol., 4: 113-118.-   Clemens, D. L. and Horwitz M. A., 1995, Characterization of the    Mycobacterium tuberculosis phagosome and evidence that phagosomal    maturation is inhibited, J. Exp. Med., 181: 257-270.-   Colignon J. E., 1996, Immumologic studies in humans. Measurement of    proliferative responses of culturered lymphocytes. Current Protocols    in Immunology, NIH, 2, Section II.-   Daniel, T. M. et al. 1987, Am. Rev. Respir. Dis., 135:1137-1151).-   Dellagostin, O. A., Esposito G., Eales L.-J., Dale J. W. and    McFadden J. J., 1995, Activity of mycobacterial promoters during    intracellular and extracellular growth. Microbiol., 141: 2123-2130.-   Drake, T. A. et al. 1987. J. Clin. Microbiol. 25: 1442-1445.-   Dramsi et al., 1997, Infection and Immunity, 65, 5: 1615-1625.-   Duck, P. et al., 1990, Biotechniques, 9: 142-147.-   Erlich, H. A. 1989. In PCR Technology. Principles and Applications    for DNA Amplification. New York: Stockton Press.-   Felgner et al., 1987, Proc. Natl. Acad. Sci., 84: 7413.-   Fraley et al., 1980, J. Biol. Chem., 255: 10431.-   Gaillard, J. L., Berche P., Frehel C., Gouin E. and Cossart P.,    1991, Entry of L. monocytogenes into cells is mediated by    internalin, a repeat protein reminiscent of surface antigens from    Gram-positive cocci, Cell., 65: 1127-1141.-   Garbe, T., Harris D., Vordermeir M., Lathigra R., Ivanyi J. and    Young D., 1993, Expression of the Mycobacterium tuberculosis    19-kilodalton antigen in Mycobacterium smegmastis: immunological    analysis and evidence of glycosylation. Infect. Immun., 61: 260-267.-   Guateli, J. C. et al., 1990, Proc. Natl. Acad. Sci. USA, 87:    1874-1878.-   Harboe et al., 1996, Infect. Immun., 64: 16-22.-   Herrmann, J. L., O'Gaora P., Gallagher A., Thole J. E. R. and    Young D. B., 1996, Bacterial glycoproteins: a link between    glycosylation and proteolytic cleavage of a 19 kDa antigen from    Mycobacterium tuberculosis, EMBO J. 15: 3547-3554.-   Houbenweyl, 1974, in Meuthode der Organischen Chemie, E. Wunsch Ed.,    Volume 15-I et 15-II, Thieme, Stuttgart.-   Huygen, K. et al., 1996, Nature Medicine, 2(8): 893-898.-   Innis, M. A. et al., 1990. in PCR Protocols. A Guide to Methods and    Applications. San Diego: Academic Press.-   Isberg, R. R., Voorhis D. L. and Falkow S., 1987, Identification of    invasin: a protein that allows enteric bacteria to penetrate    cultured mammalian cells, Cell, 50: 769-778.-   Jacobs, W. R. et al., 1991. Construction of mycobacterial genomic    libraries in shuttle cosmids. Genetic Systems for Mycobacteria,    Methods in Enzymology, 204: 537-555.-   Jacobs, W. R. et al., 1993, Science, 260: 819-822.-   Kaneda, et al., 1989, Science, 243:375.-   Kiehn, T. E., et al. 1987. J. Clin. Microbiol. 25: 1551-1552.-   Kievitis, T. et al., 1991, J. Virol. Methods, 35: 273-286.-   Kohler, G. et al., 1975, Nature, 256(5517):495-497.-   Kwoh, D. Y. et al., 1989, Proc. Natl. Acad. Sci. USA, 86:1173-1177.-   Landegren, U. et al., 1988, Science, 241: 1077-1080.-   Lang, T. and Antoine J.-C., 1991, Localization of MHC class II    molecules in murine bone marrow-derived macrophages. Immunology, 72:    199-205.-   Lee, B. Y and Horwitz M. A., 1995, Identification of macrophage and    stress-induced proteins of Mycobacterium tuberculosis, J. Clin.    Invest., 96: 245-249.-   Lim, E. M., Rauzier J., Timm J., Torrea G., Murray A., Gicquel B.    and Portnoï D., 1995, Identification of Mycobacterium tuberculosis    DNA sequences encoding exported proteins, using phoA gene    fusions, J. Bacteriol., 177: 59-65.-   Lizardi, P. M. et al., 1988, Bio/technology, 6: 1197-1202.-   Mahan, M. J. et al., 1993. Selection of bacterial virulence genes    that are specifically induced in host tissues, Science, 259:    686-688.-   Manoil L., Mekolanos J. J. and Beckwith J., J. Bacteriol., 1990,    172: 515-518.-   Matthew, J. A. et al., 1988, Anal. Biochem., 169:1-25.-   Merrifield, R. D., 1966, J. Am. Chem. Soc., 88(21): 5051-5052.-   Midoux, P. et al., 1993, Nucleic Acids Research, 21: 871-878.-   Miele, E. A. et al., 1983, J. Mol. Biol., 171: 281-295.-   Minton, N. P., 1984, Gene, 31: 269-273.-   Montgomery et al., 1993, DNA Cell Biol., 12: 777-783.-   Navarre, W. W. et al., 1994, Molecular Microbiologie, 14(1):    115-121.-   Navarre, W. W. et al., 1996, J. of Bacteriology, 178, 2: 441-446.-   Pagano et al., 1967, J. Virol., 1: 891.-   Pastore, 1994, Circulation, 90:1-517.-   Patel, et al. 1990, J. Clin. Microbiol. 28: 513-518.-   Prentki, B. and Krish H. M., 1984, Gene 29: 303-313.-   Pettersson R., Nordfelth J., Dubinina E., Bergman T., Gustafsson M.,    Magnusson K. E. and Wolf-Watz H., 1996, Modulation of virulence    factor expression by pathogen target cell contact. Science, 273:    1231-1233.-   Plum, G. and Clark-Curtiss J. E., 1994, Induction of Mycobacterium    avium gene expression following phagocytosis by human macrophages.    Infect. Immun., 62: 476-483.-   Roberts, M. C., et al., 1987, J. Clin. Microbiol. 25: 1239-1243.-   Rolfs, A. et al., 1991, In PCR Topics. Usage of Polymerase Chain    Reaction in Genetic and Infectious Disease. Berlin: Springer-Verlag.-   Sambrook, J. et al. 1989, In Molecular Cloning: A Laboratory Manual.    Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press.-   Sanchez-Pescador, R., 1988, J. Clin. Microbiol., 26(10): 1934-1938.-   Schneewind, O. et al., 1995, Science, 268: 103-106.-   Segev D., 1992, in Non-radioactive Labeling and Detection of    Biomolecules. Kessler C. Springer Verlag, Berlin, New-York, 197-205.-   Servant, P. and Mazodier P., 1995, Characterization of Streptomyces    albus 18-kilodalton heat shock-responsive protein. J. Bacteriol.,    177: 2998-3003.-   Shiver, J. W., 1995, in Vaccines 1995, eds Chanock, R. M. Brown, F.    Ginsberg, H. S. & Norrby, E.), pp. 95-98, Cold Spring Harbor    Laboratory Press, Cold Spring Harbor, N.Y.-   Sorensen et al., 1995, Infect. Immun., 63: 1710-1717.-   Stone, B. B. et al., 1996, Mol. and Cell. Probes, 10: 359-370.-   Stover, C. K., Bansal G. P., Hanson M. S., Burlein S. R.,    Palaszynski S. R., Young J. F., Koenig S., Young D. B., Sadziene A.    and Barbour A. G., 1993, Protective immunity elecited by recombinant    Bacille Calmette-Guérin (BCG) expressing outer surface protien A    (OspA) lipoprotein: a candidate Lyme disease vaccine. J. Exp. Med.,    178:197-209.-   Sturgill-Koszycki, S., Schlesinger P. H., Chakroborty P., Haddix P.    L., Collins H. L., Fok A. K., Allen R. D., Gluck S. L., Heuser J.    and Russell D. G., 1994, Lack of acidification of Mycobacterium    phagosomes by exclusion of the vesicular proton-ATPase. Science,    263: 678-681.-   Tascon, R. E. et al., 1996, Nature Medicine, 2(8): 888-892.-   Technique for assembling oligonucleotides, 1983, Proc. Natl. Acad.    Sci. USA, 80: 7461-7465.-   Technique for beta-cyanethylphosphoramidites, 1986, Bioorganic    Chem., 4: 274-325.-   Thierry, D. et al., 1990, Nucl. Acid Res., 18: 188.-   Timm, J., Perilli M. G., Duez C., Trias J., Orefici G., Fattorini    L., Amicosante G., Oratore A., Boris B., Frere J. M., Pugsley A. P.    and Gicquel B., 1994, Transcription and expression analysis, using    lacZ and phoA gene fusions, of Mycobacterium fortuitum B-lactamase    genes cloned from a natural isolate and a high-level B-lactamase    producer. Mol. Microbiol., 12: 491-504.-   Tuberculosis Prevention Trial, 1980, Mendis, Trial of BCG vaccines    in South India for Tuberculosis Infection, Indian J. of Med. Res.,    1972 (Suppl.): 1-74.-   Urdea, M. S. et al., 1991, Nucleic Acids Symp. Ser., 24:197-200.-   Urdea, M. S., 1988, Nucleic Acids Research, 11: 4937-4957.-   Verbon, A., Hartskeerl R. A., Schuitema A., Kolk A. H., Young D. B.    and Lathigra R., 1992, The 14,000-molecular-weight antigen of    Mycobacterium tuberculosis is related to the alpha-crystallin family    of low-molecular-weight heat shock proteins. J Bacteriol., 174:    1352-1359.-   Walker, G. T. et al., 1992, Nucleic Acids Res., 20:1691-1696.-   Walker, G. T. et al., 1992, Proc. Natl. Acad. Sci. USA, 89: 392-396.-   Wiker, H. G. et al., 1992, Microbiol. Rev., 56: 648-661.-   Yamaguchi, R. et al., 1989, Infect. Immun., 57: 283-288.-   Xu, S., Cooper, A., Sturgill-Koszycki, S., van Heyningen, T.,    Chatterjee, D., Orme, I., Allen, P. and Russel, D. G., 1994,    Intracellular trafficking in Mycobacterium tuberculosis and    Mycobacterium avium-infected macrophages, J. Immunol., 153:    2568-2578.-   Young, D. B. et al., 1992, Mol. Microbiol., 6:133-145.-   Yuen, L. K. W. et al., 1993, J. Clin. Microbiol., 31: 1615-1618.

1-74. (canceled)
 75. A purified polynucleotide, comprising a nucleotidesequence chosen from: a) SEQ ID NOS: 1, 8, 14, 25, 31, 33 and 35; b)nucleotide 964 to nucleotide 1234, inclusive, of SEQ ID NO: 1; c) asequence complementary to a nucleotide sequence defined in a) or b); d)a sequence that exhibits at least 50% identity with a nucleotidesequence defined in a), b) or c); and e) a fragment of at least 12consecutive nucleotides of a nucleotide sequence defined in a), b), orc.
 76. A purified polynucleotide consisting of a nucleotide sequencechosen from SEQ ID NOS: 1, 8, 14, 25, 31, 33 and
 35. 77. A purifiedpolypeptide encoded by a polynucleotide sequence according to claim 75.78. A purified polypeptide, comprising an amino acid sequence chosenfrom: a) SEQ ID NO: 2 to 7, 10 to 13, 15 to 24, 26 to 30, 32, 34, 36 to40 and 543, b) a biologically active polypeptide fragment of apolypeptide defined in a) having at least 5 amino acids and capable ofbeing exported and/or secreted by a mycobacterium, and/or of beinginduced or repressed during infection with the mycobacterium; and/orcapable of inducing, repressing or modulating, directly or indirectly, amycobacterium virulence factor; and/or capable of inducing animmunogenicity reaction directed against mycobacteria; and/or capable ofbeing recognized by an antibody which is specific for mycobacterium. 79.A purified polynucleotide, comprising a nucleotide sequence that encodesa polypeptide according to claim
 78. 80. A purified polynucleotideprimer, comprising a polynucleotide sequence of at least 12 consecutivenucleotides of a polynucleotide according to claim
 75. 81. The purifiedpolynucleotide primer according to claim 80, wherein the polynucleotidesequence is chosen from SEQ ID NO: 528 and SEQ ID NO:
 529. 82. Apolynucleotide comprising a nucleotide sequence according to claim 75,wherein the polynucleotide is labeled with a radioactive compound orwith a nonradioactive compound.
 83. A polynucleotide comprising anucleotide sequence according to claim 75, wherein the polynucleotide iscovalently or noncovalently immobilized on a support.
 84. Thepolynucleotide according to claim 83, wherein the nucleotide sequence isnucleotide 964 to nucleotide 1234, inclusive, of SEQ ID NO:
 1. 85. Arecombinant cloning, expression and/or insertion vector, comprising anucleotide sequence according to claim
 75. 86. A host cell transformedwith the recombinant vector according to claim
 85. 87. The host cellaccording to claim 86, wherein the cell is the E. coli straintransformed with the plasmid pDP428 deposited on 28 Jan. 1997 at theCNCN under the No. I-1818, or the cell is of a strain of M.tuberculosis, M. bovis or M. africanum.
 88. The host cell according toclaim 87, wherein the cell is of a strain of M. smegmatis, M. bovis, M.bovis BCG, or M. africanum.